Data driven modeling based on dynamic parsimonious fuzzy neural network

Neurocomputing 110 (2013) 18–28

Contents lists available at SciVerse ScienceDirect

Neurocomputing

0925-23

http://d

n Corr

E-m

pratama

emjer@

rjoentar

arifin-I@1 Fa

journal homepage: www.elsevier.com/locate/neucom

Data driven modeling based on dynamic parsimonious fuzzy neural network

Mahardhika Pratama a,n, Meng Joo Er b,1, Xiang Li c, Richard J. Oentaryo b,1,Edwin Lughofer d, Imam Arifin e

a The University of New South Wales, Northcott drive, Canberra, ACT 2600, Australiab Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singaporec Singapore Institute of Manufacturing Technology, Nanyang Drive 78, Singapore 638075, Singapored Department of Knowledge-Based Mathematical Systems, Johannes Kepler University, Linz A-4040, Austriae Institut Teknologi Sepuluh Nopember, Campus ITS Sukolilo, Surabaya 60111, Indonesia

a r t i c l e i n f o

Article history:

Received 1 November 2011

Received in revised form

13 November 2012

Accepted 17 November 2012

Communicated by D. Wangtraining datum arrives, which enables to cope with non-stationary processes. We propose two criteria

Available online 2 January 2013

Keywords:

Dynamic parsimonious fuzzy neural

network (DPFNN)

Radial basis function (RBF)

Self organizing map (SOM)

Rule growing

Rule pruning

12/$ - see front matter & 2012 Elsevier B.V. A

x.doi.org/10.1016/j.neucom.2012.11.013

esponding author. Tel.: þ6289676186386; fa

ail addresses:

@ieee.org, [email protected] (M. Prata

ntu.edu.sg (M.J. Er), [email protected]

[email protected] (R.J. Oentaryo), edwin.lughofe

ee.its.ac.id (I. Arifin).

x: þ62315939177.

a b s t r a c t

In this paper, a novel fuzzy neural network termed as dynamic parsimonious fuzzy neural network

(DPFNN) is proposed. DPFNN is a four layers network, which features coalescence between TSK

(Takagi–Sugeno–Kang) fuzzy architecture and multivariate Gaussian kernels as membership functions.

The training procedure is characterized by four aspects: (1) DPFNN may evolve fuzzy rules as new

for rule generation: system error and e-completeness reflecting both a performance and sample

coverage of an existing rule base. (2) Insignificant fuzzy rules observed over time based on their

statistical contributions are pruned to truncate the rule base complexity and redundancy. (3) The

extended self organizing map (ESOM) theory is employed to dynamically update the centers of the

ellipsoidal basis functions in accordance with input training samples. (4) The optimal fuzzy consequent

parameters are updated by time localized least square (TLLS) method that exploits a concept of sliding

window in order to reduce the computational burden of the least squares (LS) method. The viability of

the new method is intensively investigated based on real-world and artificial problems as it is shown

that our method not only arguably delivers more compact and parsimonious network structures, but

also achieves lower predictive errors than state-of-the-art approaches.

& 2012 Elsevier B.V. All rights reserved.

1. Introduction

1.1. Preliminary

Nowadays, in modern industries analytical or physics-basedmodels are difficult or sometimes impossible to be deployed dueto a high system complexity and non-stationary characteristic. Asan alternative way, the data-driven modeling, which are capableof performing system optimization, control and/or modelingdirectly using the input–output observations collected fromvarious real-world processes, are increasingly demanded. Ingeneral, the development of data-driven modeling tools involvestwo key objectives: a high modeling accuracy (e.g., low approx-imation error or misclassification rate) and a low model complex-ity (e.g., a small number of nodes or rules). For clarity, one may

ll rights reserved.

x: þ62315939177.

ma),

g (X. Li),

[email protected] (E. Lughofer),

discern that the structural complexity can be pinpointed by thenumber of network parameters stored in the memory hinging onthe number of rules, input features and network type. Themajority of conventional modeling techniques, however, focuslargely on the modeling accuracy without paying an attention onthe susceptibility of over-complex model representations. As areciprocal impact, this may complicate the users from under-standing the system being modeled. It is conceivable that thefrugal rule base is in line with the high level of interpretability ofthe rule base. Such models usually impose a high structural andcomputational burden, thus being hampered to be installed in on-line applications entailing rapid model updates without thedanger of over-fitting. One may envisage that the computationalburden is defined by resultant cost of learning modules. Con-versely, the memory demand owns an affinity with the computa-tional cost in which the memory demand can be perceived fromthe total number of parameters stored in the repository. (i.e.,number of training data and network parameters).

Many methodologies in modeling unknown system dynamicsrelying on the input–output data observations have been tremen-dously developed by many researchers. One of well knownapproaches are artificial neural networks (ANNs) [68] whichincorporate associative representations of the input and output

www.elsevier.com/locate/neucom

www.elsevier.com/locate/neucom

http://dx.doi.org/10.1016/j.neucom.2012.11.013



mailto:[email protected]








M. Pratama et al. / Neurocomputing 110 (2013) 18–28 19

examples in terms of synaptic weights between various layers.Nevertheless, a major grievance of the classical NNs cannotactualize automatic knowledge discovery, which necessitatesthe network structure to be fixed a priori and in turn lack abilityin dealing with time-variant systems. Apparently, this approach isimpractical in overcoming complex on-line real world engineer-ing problems due to too less flexibility.

To correct this limitation, whenever a new state/condition ofthe system arises, the existing network should automaticallyre-organize or even expand its structure so as to accommodatethe new knowledge. In other words, the NNs should be equippedby a structural learning strategy which is an ad-hoc leverageto automate the growth and movement of the present hiddennodes in order to guarantee its completeness in capturing avail-able training stimuli flexibly and on-the-fly. In what follows,traditional batch learning approaches (such as e.g., FAOS–PFNN[20] and MRAN [4]) are inapplicable for this task, as (1) iteratingover the complete data set multiple times and (2) disallowing toassimilate new knowledge on demand. Noticeably, this conceptdegenerates the computational burden and is incompetent to on-line life-long learning.

Another major bottleneck of ANNs lies in their fundamentalarchitecture mimicking neurons and connections in the humanbrain, which is generally opaque and un-interpretable for the usersas it is inherent to black boxes. Vice versa, fuzzy (logic) systemsbased on the concept of fuzzy sets [8] are able to explain the implicitsystem relations using linguistic fuzzy rules, and realize approx-imate reasoning coping with imprecision and uncertainty indecision-making processes in a possibilistic manner [28,42]; thiscan be achieved with generalization performance of fuzzy rules [66].To this end, this has led to the development of fuzzy neural networks

(FNNs), a powerful hybrid modeling approach that integrates thelearning abilities, parallelism, and robustness of neural networkswith the human-like linguistic and approximate reasoning conceptsof fuzzy systems.

1.2. Review over state-of-the-art algorithms

Historically, a pioneering work in FNNs was initiated by Jang[1,9] with the so-called adaptive network based on fuzzy infer-ence system (ANFIS). Nevertheless, the network structure of theANFIS has to be predefined, which is unable to be dynamic/adaptive in responses to the changing (evolving) data stream [5].Another prominent (static, non-dynamic) FNN approach is pre-sented in [64], where a hybrid TS-neural network architecture ispresented, capable of refining weights of the structural compo-nents (fuzzy rules); To introduce more flexible fuzzy neuralnetworks, several typical semi-online FNNs namely DFNN[10,11], a successor of DFNN termed GDFNN [13,14], and SOFNN[17,18], were devoted by several authors. DFNN, GDFNN and SOFNNare short forms of dynamic fuzzy neural network (DFNN), general-ized dynamic fuzzy neural network (GDFNN) and self organizingfuzzy neural network (SOFNN), respectively. All of these learningmachines possess a peculiar ability to automate fuzzy rule evolutionand pruning simultaneously. Nevertheless, these methods comple-tely revisit past training signals intensifying computational burdenover time and are unable to handle a vast amount of data.

Researches in developing data-driven fuzzy modeling toolculminated in a prominent proposal of Angelov and Filev bymeans of evolving Takagi Sugeno (eTS) [36]. The eTS conveys anew concept of cluster potential in order to augment its rule basesize in a sample-wise manner. Another approach was proposed byLughofer with the so-called flexible fuzzy inference system(FLEXFIS) [38], which poses an incremental evolving version ofvector quantization algorithm (eVQ) [61] and integrates some

advanced concepts for better robustness and reliability (resultingin FLEXFISþþ [67]).

Furthermore, other approaches were deliberated by literature[20,39] termed as fast and accurate self organizing scheme forparsimonious fuzzy neural network (FAOS–PFNN), parsimoniousand accurate fuzzy neural network (PAFNN), respectively. A fore-most ingredient of these algorithms wields error reduction ratio(ERR) method [12] as an effective criterion to amalgamate anew rule. Opposed to eTS and FLEXFIS, a major grievance ofthese methods is the need to gather all presented training dataduring the execution of its teaching mechanism. Vice versa, allapproaches listed in this paragraph do not have any rule pruningmechanisms integrated, which is beneficial in order to endow acompact and parsimonious rule base while retaining their pre-dictive accuracy.

In line with a rule base simplification purpose, several sequen-tial pruning mechanisms have been proposed by researchers, forinstances [3,6,7,31] which benefit some concepts of singular valuedecomposition (SVD) and of statistical methods. The approach in[6,7] discriminates with some approaches eliminating fuzzy rulesin offline or post-processing manner like ERR method [12] inDFNN and GDFNN [13,14], as it quantifies the importance of afuzzy rule with the use of the newest training datum. Conversely,the concepts in [6,7] are plugged in as one important componentof the incremental learning machine termed as sequential adap-tive fuzzy system (SAFIS) [21]. However, a main deficiency of theSAFIS utilizes a singleton (Sugeno) instead of TSK (Takagi–Sugeno–Kang) fuzzy system. It is well known that the TSK fuzzysystem allows a better generalization capability than Sugenomodels [41,58]. All state-of-the-art methods aforementionedherein exploit uni-dimensional membership functions, whichtriggers hyper-spherical regions. In retrospect, this theory merelywraps 2 parameters per fuzzy rule, which evoke the same fuzzyregion for all input attributes, so that it is indeed not necessarilycoincident with the real occurring data distribution. To obviatethis drawback, a plausible solution provider of this drawback is toconsolidate a paradigm of multi-dimensional membership func-tions [54–56], whose axes are not necessarily parallel to the inputvariable axes. The advantageous features of this method excel theprevious type of uni-dimensional membership functions in thesense that it favors the input variable interactions in form ofpartial local correlations. A comprehensive survey on state-of-the-art evolving neuro-fuzzy approaches can be found in [52].

1.3. Our approach

In this article, a novel fuzzy neural network namely dynamicparsimonious fuzzy neural network (DPFNN) is devised, whichfeatures a synergy between high predictive accuracy and lowstructural complexity. Opposed to the aforementioned methods,our new approach grants a coherent methodology for incrementallearning of fuzzy neural networks, which integrates (1) a rule

evolution strategy and synchronously assuring epsilon-completeness

of fuzzy rule bases, thus establishing an important interpretabilityaspect, (2) a rule pruning mechanism based on rule significances,thus mitigating the complexity and on-line training timeof the evolved models and (3) ellipsoidal clusters in arbitrary

position, thus being able to model local correlations betweeninputs and outputs.

More specifically, the DPFNN is a four layer network, whereeach layer undertakes a specific operation in order to manifestTSK fuzzy decision makings. On the one hand, the premise partsof DPFNN constitute multi-dimensional membership functionsachieving ellipsoidal regions in the input space. On the otherhand, the consequents consist of first order polynomials fusinginput variables and few constants. In the first stage, the DPFNN

Fig. 1. Architecture of DPFNN.

M. Pratama et al. / Neurocomputing 110 (2013) 18–2820

commences its learning process from scratch with an emptyfuzzy rule-base. Moreover, the extraneous rules are then parsi-moniously amalgamated in accordance with two rule growingcriteria, which are system error and e-completeness criteria. Thesecond learning stage involves the tuning of input and outputparameters of the fuzzy rules. For the antecedent parts, extendedself-organizing map (ESOM) theory is applied to dynamicallyadjust the center of the fuzzy rules so as to better suit the inputdata distribution. Meanwhile, time localized least square (TLLS)is consolidated to derive an optimal set of the fuzzy consequentparameters. By extension, the TLLS method finalizes an LS adapta-tion relying on the data points contained in a sliding window [41],thus preventing to reuse all already seen data as it is needed for LSmethod. To circumvent infinite size of the rule base, the last stagewithin the incremental DPFNN learning cycle is to remove incon-sequential fuzzy rules. Therefore, DPFNN adopts a rule pruningmodule as mounted in SAFIS prolonging it to hyper-plane conse-quents and multivariate kernels.

The merits of DPFNN have been experimentally validatedthrough various artificial and real world datasets and bench-marked with miscellaneous state-of-the-art methods. Finally, wemay infer that DPFNN may outperform state of the art works interms of predictive fidelity and structural burden achieving abalance between the predictive accuracy and the structural cost.

The remainder of this paper is organized as follows: Section 2elaborates the network architecture of DPFNN. Section 3 exploresthe holistic working principle of the DPFNN. Section 4 outlinesexperimental results in various benchmark problems includingartificial and real world datasets. Conclusions are drawn in thelast section of this paper.

2. Architecture of DPFNN

Various methodologies for fuzzy identification models havebeen tremendously published in numerous literatures [30,43,44].Nevertheless, the TSK fuzzy systems [30] have been largelyscattered in a much broader scope than the relational fuzzymodels [44]. The TSK fuzzy system is worth noting to possessan interesting property allowing that any real-occurring nonlinearrelationship can be approximated to a certain degree of accuracy,thus confirming models with a high predictive accuracy [25]. Byextension, a TSK fuzzy system is one step toward rapprochementbetween a conventional precise mathematical model and ahuman-like decision making, as it characterizes linguistic andmathematical rules in the antecedent and consequent parts,respectively. Nowadays, some techniques to boost the general-ization of the TSK fuzzy system have been cast by several authorscapitalizing a maximization of the uncertainty or a combinationof the rough sets [62–64].

As such reason, DPFNN can be delineated as a four layersnetwork in which each layer is committed to perform a particularoperation in tandem so as to enforce the TSK fuzzy mechanism.For the sake of flexibility, the antecedent part of DPFNN iscomposed of multidimensional membership functions triggeringellipsoidal rules, thus taking into account input variable interac-tions owing to axes not necessarily in a same range to inputvariable axes.

At any time t, the input and output signals are supplied bycrisp variables xt and yt. In the sequel, the operative procedures ofeach layer are detailed. Fig. 1 visualizes the proposed networkarchitecture.

Input layer: Each node in this layer represents an input featureof interest and feeds these input signals to the next layer. Thesenodes interface with the external environment and are turned onwhen they capture external stimuli.

Hidden/rule layer: Each node of this layer constitutes thepremise part of the fuzzy rule and serves to transform a crispvalue into a particular value in the fuzzy domain in whichGaussian function is in charge to manifest input transforma-tions/fuzzifications. Moreover, the use of Gaussian law is byvirtue of gaining a smooth approximation for a local data spaceand omitting undefined input space (that is the case where thenormalized basis function in Eq. (2) becomes 0/0) [53]. Theproduct of this layer is termed a rule firing strength, which canbe mathematically expressed as follows:

Ri ¼ exp �1

2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX�Cið ÞSi

�1 X�Cið Þ

q� �ð1Þ

where C is a center or template vector of ith fuzzy rules CAR1�u

and X is an input vector of interest XAR1�u. S is the datacovariance matrix for samples falling into Ri, SARu�u, whosemain diagonal element consists of ski k¼1,2y,u and i¼1,2,y.r.

Normalization layer: A node in this layer is in charge tonormalize the rule firing strength into a certain range of [0,1].In this panorama, the number of nodes in this layer is as with thenumber of nodes in the rule layer.

ji ¼RiPr

i ¼ 1 Ri

ð2Þ

Output layer: If the center of gravity [9] method is benefited toperform back-transformations/defuzzifications of the final systemoutput to a crisp variable, which is an action to the externalenvironments. That is, the output induced by fuzzy consequencesis eventually inferred by the weighted sum of incoming signals asfollows:

In the TSK fuzzy system, W is Wi ¼ k0iþk1ix1þ � � � þkuixu,i¼ 1,2,::::,r (i.e., first order polynomial), where WAR1� (uþ1)r andCAR(uþ1)r�1. For notational simplicity, the DPFNN is pictoriallyshown as a multi-input–single-output (MISO) system. However, itcan be easily prolonged to deal with a class of multi-input–multi-output (MIMO) system.

y¼Xr

i ¼ 1

wiji ¼WC¼Pr

i ¼ 1 RiwiPri ¼ 1 Ri

ð3Þ

3. Learning algorithm for DPFNN

This section explores overall learning procedures of the pro-posed DPFNN. The learning framework of the DPFNN consists offour phases as follows: rule updating based on ESOM, allocationof rule premise parameters, determination of rule consequent


parameters, and pruning of inconsequential rules. Algorithm1 elucidates a holistic overview of the DPFNN learning scenario.By extension, this section also elaborates the computational andstructural costs of the proposed algorithm.

ALGORITHM 1. DPFNN LEARNING PROCEDURE1. if the rule base is empty2. create a new rule whose center and width are set as data

point & s0, respectively. s0 is a predefined constant3. Initialize consequent weights of the new rule using LS4. else5. Undertaking the learning procedures as the scenarios in

Section 3.26. Prune inconsequential rules (Section 3.4)7. end if8. if a new sample comes in9. go back to step 510. end if

3.1. Rule updating based on extended self organizing map (ESOM)

The self organizing map (SOM) method is first originated byKohonen [26]. The versatility of the SOM method has promptedmany researchers [22,23,27] to embed it in their neural and fuzzysystems. Typically, the SOM theory is employed to update thefocal points of the fuzzy rules so as to track the input distributionsclosely. However, the traditional SOM approach is deemed defi-cient, which only involves the Euclidean distance between therule (node) and the current datum excluding a zone of influenceof the Gaussian membership functions. To this end, the extendedself organizing map method was ascribed in [24], whereby thewinning rule is elicited via both the distance and membershipfunction width, thus ensuing to produce a more representativewinner. For each training episode (xn,tn), the ESOM method seeksthe winning rule Cv using (2) and adjusts all centers of theellipsoidal units as follows:

Cin¼ Ci

n�1þbnRv

nhvn Xn�Ci

n�1� �

ð4Þ

bn labels a learning rate, Rvn denotes the firing strength of the

winner, and hin is a neighborhood function and is defined as

follows:

hin¼ exp �

1

2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCv�Cið ÞSi

�1 Cv�Cið Þ

q� �� ð5Þ

bn¼ 0:1exp �nð Þ ð6Þ

The neighborhood function hin approximates a matching factor

of two neighboring rules, whose values epitomize a distancebetween the winner and another rule. One may conceive thata priority of adaptation is granted to a rule which lies in theadjacent proximity to the winner bearing a larger value ofneighborhood function. Whereas a less value of the neighborhoodfunction subsumes that an adapted rule is inadequately similar tothe winner, so that it is solely moved slightly from its originalposition. Conversely, the learning rate bn decays exponentially asmore examples have been traversed by DPFNN assuming a morerefined rule base has been crafted in the late of training process.

3.2. Criteria for rule generation by dynamic rule generation

thresholds

The constructions of fuzzy rules in DPFNN are orchestratedaccording to two criteria, which are applicable to be cursors of a

rule base expansion. Moreover, the two criteria are system errorand e-completeness. In the sequel, we will elaborate these twocriteria in detail.

3.2.1. System error

This criterion was originally put forward by Platt [2] and hasbeen largely adopted in miscellaneous literatures [2,3,13–15,17–19,20,22,23]. The aim of the use of this criterion is to stipulateDPFNN performance in covering the given training datum refer-ring to an error arising owing to the new incoming datum fed.More specifically, the system error is defined as follows:

en ¼ :tn�yn: ð7Þ

where tn the measured output and yn the predicted output valueat time instance n.

If a nth training episode complies enZke or the system error isconsidered sufficiently large, where ke is a dynamic rule genera-tion threshold, existing rules are considered inadequate to coverthe new data point. Accordingly, a new rule ought to be craftedand appended to the rule-base in order to fill up the scarcity ofthe rule base outreach.

One may comprehend that the threshold ke is initially set as alarge value and gradually decreases over time. The large thresholdvalue at the beginning of learning is intended to construct acoarse rule base, which sets the most troublesome positions inthe underlined training patterns to be coverable. As ke decaysexponentially, a more precise fuzzy rule-base is formed to assurea high quality rule base, which allows the crisp variable to beaccurately transformed into specific value in the fuzzy domain.

ke ¼max emax � exp �n

q

� �, emin

� �ð8Þ

where emin and emax are predefined constant, as foreshadowed inthe previous section, DPFNN adopts the Time Localized LeastSquare (TLLS), which solely solicits the most recent qon datapoints, whereas the remaining samples are discarded.

3.2.2. e-Completeness

This criterion is employed to estimate a compatibility of thenewest datum whether it is possibly the supplementary focalpoint fostering the rule base, in parenthesis, it supplies a blue-print of the injected training datum whether its novelty isadmitted in order to seize the overall input distribution. Tofacilitate this criterion, e-completeness proposed by Lee [34] isexploited.

Definition 1: e-completeness of fuzzy rule [34]: For any input inthe operating range, there exists at least one fuzzy rule so that thematch degree (or firing strength) is no less than e.

Consider nth input–output pairs (xn,tn), the firing strength ofeach rule is computed via Eq. (2) and the winning rule isprescribed as a rule which exhibits the highest firing strength inthe nth observation:

Rnv ¼maxi ¼ 1,...,r Rn

i

� �ð9Þ

If Rvnoe with

e¼min emin �emax

emin

� �n�1

, emax

!ð10Þ

Logically, the condition in Eq. (10) may occur especially in thenon-stationary data stream when the model may traverse adatum which lies distant from the position of the present focalpoints. Therefore, the DPFNN will act that either an extraneousrule is tailored or the existing rules conforms their positionsthereby intensifying the DPFNN rule base (see the four cases


below). On the contrary, if Rvnoe is unsatisfied, the newest

datum occupies a space in the outreach of the recent rule base.At the beginning of training process, e� emin (i.e., the winning

rules are assumed to roughly have a small firing strength of emin

as the learning mechanism just commences), thus stimulating thesystem to create a coarse covering mainly the most the trouble-some data regions. Conversely, e� emax towards the end oftraining process which fosters the system to craft a more refinedrule base that properly captures the data samples moreaccurately.

Remark: Supposing Gaussian membership functions are usedand the newest datum xn has been properly accommodated byexisting fuzzy rules in the certain range of [Cji72sji]. If a trainingepisode complies Eq. (12), it will encounter the e-completenesscriterion proposed by Lee [34] with the threshold e¼0.1354 at thebeginning of the rehearsal process. Since, the threshold e tends toexponentially increase as more examples have taught the DFPNNthereby attaining a higher similarity degree. In this regard, at theend of the learning process, if the input datum xn lies in the range[Cji7sji], they will strike the e-completeness criterion withe¼0.3679. In this angle, there is no individual input which con-straints the matching factor higher than e. Hence, DPFNN adopts thisperspective instantly setting emin ¼ 0:1354 and emax ¼ 0:3679.

Noticeably, the dynamic rule generation thresholdemay serveto establish the coarse rule base so as to obtain the mosttroublesome regime. Later on, eattains a bigger value at the endof the learning process, thus ensuing DPFNN in order to tailor amore fine-grained rule base afterwards.

In a case that the newest data sample does not comply both ofthe system error and the e-completeness, the existing rule base isno longer representative to cluster the datum or the datumconveys a significant impact in order to foster the embrace ofthe rule base thereby recruiting it as a complementary rule.If

Rvnoe and 9en9Zke ð11Þ

the data sample is unmanageable to the existing rule base so thata new rule is evolved in order to build a new ellipsoid which iscapable of precisely capturing the data samples located nearby itsregion as follows:

Ciþ1 ¼ Xn, Siþ1 ¼ dmin ð12Þ

where

dmin,i ¼ arg min9xni �cki9,

k¼ 1,. . .,u, i¼ 1,:::,r ð13Þ

whereas the consequent parameters are crafted via time localizedleast square (TLLS) method as explored by Section 3.3. By actingfuzzy rule discovery autonomously, DPFNN enables to manage itscompleteness in accordance with the training data fed, which is ajudicious way to cope with possible time-varying characteristicsof the training data.

In addition to a case of Rvnoe and 9en9Zke, there exist few

other occasions emerging while the learning engine of the DPFNNis switched on, which are detailed as follows:

Case 2. 9en9Zke, RnvZe

The newly injected datum xn can be clustered to the existingfuzzy rules. However, the predictive accuracy of the DPFNN isdeemed inadequate to suffice the tolerable accuracy of theDPFNN, so that the consequent parameters of the DPFNN aremerely polished up using TLLS method which is explored here-after in the next section.

Case 3. 9en9oke, max Rnj

� �re

This case implies that DPFNN already produces a desirableoutput accuracy, but the input data xi is untouchable with theexisting rule base and in turns unable to be segmented. Hence,the existing rules are adjusted by means of the ESOM theory.

Case 4. 9en9oke, max Rnj

� �4e

The DPFNN already yields a convincing performance satisfyingthe two criteria. Hence, no action is carried out in this circum-stance sustaining the recent formation of the rule base.

3.3. Determination of consequent parameters

To derive optimum consequent parameters, DPFNN benefitsthe least square (LS) method as this algorithm is simple and agileto deliver global optima solutions, which are generally appealingto track the footprints of most real world problems. Unlike backpropagation (BP) method, which is another popular method in theneural-network field, suffers from a serious convergence problem.That is, the BP method is usually slow in attaining global minimaand can be easily trapped in local minima [59].

Yet, the major bottleneck of the traditional LS method is severecomputational efforts induced by the involvement of a highdimensional matrix while exploiting all gathered training data inwhich some of them are probably outdated. It should be envisagedthat the computational complexity increases when there is asample increment obviously retarding the training process and italso engenders the system memory overflow.

Accordingly, the LS method should be due a forgettingmechanism, so that it constraints an unaffordable computationalexpense as abundant data points embark to the model. To remedythis shortcoming, the moving window or time localized techniqueintroduced by [41] is one of sensible alternatives. The foremostconstituent of the time localized least square (TLLS) method isthat only the last q data points are conserved to a model update,while other data samples are dispossessed. Hence, this concept iscapable of a more flexible update and in turns steeply detractingthe computational burden and memory demand of the classical LSmethod. Therefore, the TLLS method may confer an instantaneousadaptation depending on sliding window size q which can bededuced relatively small against the size of training data throughour rigorous experimentations.

W ¼ T jTj� ��1jT ð14Þ

One the one hand, T¼(t1,t2,y,tq)ARq is the localized targetdata and jARr(uþ1)� q whose element is xknRin. On the otherhand, the expression (jTj)�1jT is a pseudoinverse of matrix j.The weight vector W¼[k10,k20,y,kk0,y,k1r,y,kkr]ARr� (uþ1)

contains the consequent parameters of the DPFNN in the formof the TSK fuzzy type. One may presume that, albeit, the sub-optima solutions of the LS method are sought, we observe thatsub-optima solutions are arguably sufficient to replicate thetarget vector T. On the other hand, the past data may no longerreflect or are obsolete to describe the current data trends (i.e.,consider when the footprints of the process may smoothly orabruptly change overtime from one operating points to otherswhich is dubbed as drift situations).

3.4. Pruning of inconsequential rules

In the area of neural-networks or fuzzy neural networks, it isalways desirable to achieve a coherent tradeoff between a pre-dictive accuracy and a model simplicity. Loosely speaking, theconvoluted network structure is inherent with an over-fittingissue, which is undesired in most circumstances. Apart from that,it inhibits the users from understanding the system being


modeled and suppresses the interpretability of explanatory mod-ule of the fuzzy neural network due to deteriorating rule seman-tics, in parenthesis, considering a neural network with a fewhundred rules in order to finalize a particular task.

To ascertain a concise network structure being assembled, it isnoteworthy for every model in order to be endued with an ad-hocmechanism of the rule base simplification. On the one hand, therule pruning leverage is usable to point out the obsolete fuzzyrules, which contribute little during their lifespan or are no longerinformative to delineate the recently observed data trends. On theother hand, this mechanism is also efficacious to remove a rulewhich may be outlier. For clarity, in the noisy environment, themodels may wrongly synthesize a new rule which may be outlier.Nevertheless, the mismatch of fuzzy rule recruitments can becorrected with the rule pruning adornment as such rules orclusters are populated with so little or even none of data pointsthereby classed as inactive fuzzy rules. Henceforth, they can beevicted from the rule base without significant loss of the pre-dictive accuracy of the model.

In conjunction with rule pruning endeavors, tremendousvariants of rule pruning strategies have been put forward bymany researchers. In average, most of them are infeasible to beembedded due to several rationales. First, some variants imposeexpensive computational cost due to soliciting all past data andconjointly processing them (i.e., ERR [12], OBS [57]). Second,several types solely estimates the rule significance in the presenttime (i.e., using density to determine rule significance [38,60]),which noticeably distorts a nature that the rule may be indis-pensable constituent in the future time. Hence, it may jeopardizethe stability of the rule base coverage. Third, another drawback isthey merely approximate the rule sensitivity of the input para-meters regardless of the output parameter importance. In fact, theoutput parameters play a crucial role so as to diminish thetraining errors due to reflecting the system behavior in a specificoperating region of a cluster [60].

Due to the aforementioned reasons, the DPFNN inherits therule pruning strategy of [6,9], which forecasts the rule signifi-cance based on approximations of statistical contributions whenthe number of training episodes approaches to infinity. In addi-tions, the peculiar characteristics of this method endow the fuzzyrule contributions based on the significance of input and outputparameters, sequential in nature invoking the most recent train-ing datum, expelling already learned data, and also taking intoaccount the fuzzy rule contributions in the future. Although it canbe perceived suitable with DPFNN, its original version to the bestof our knowledge faces a culdesac to be directly mounted in theDPFNN learning engine as designed only for singleton fuzzysystem and uni-dimensional membership functions learning plat-forms. In connection with the DPFNN learning platform, theDPFNN learning framework exploits multi-dimensional member-ship functions and TSK type consequents. To this end, we extendits original version in [6,9] to the architecture used in DPFNN,finally arriving in Eq. (15) (the proof is left to readers).

Einf ið Þ ¼ 9di9

Quk ¼ 1 ski

� �uPrj ¼ 1

Quk ¼ 1 skj

� �u ¼ 9di9Ei ð15Þ

Einf ðiÞrkerr ð16Þ

where di ¼Pr

i ¼ 1 w1iþw2ix1þ � � � þwkixkþ � � � þwuþ1,ixu outlinesthe output contribution of the ith fuzzy rule whereas Ei denotesthe input contribution of the ith fuzzy rule. Conversely, a thresh-old kerr is set according to prior knowledge or should be selectedaround 10%emin. One may comprehend that the thresholdkerr plays a vital role in triggering a rule base simplification.Larger value of kerr may excite a worse modeling output.

Nevertheless, it solicits less number of rules to be coupled andvice versa. In line with foreshadowed exposures, the threshold kerr

is proxy to regulate a plausible trade off between predictiveaccuracy and model simplicity.

If a rule matches the condition in Eq. (16), then it will becategorized as an obsolete, inactive or needless fuzzy rule.Accordingly, it can be evicted from the rule base, decreasing thestructural burden of DPFNN.

3.5. Analysis of computational expense

The mechanisms in extracting the fuzzy rules, adapting thepremise parameters, adjusting the consequent parameters andpruning inactive fuzzy rules are inherent with the computationalcost or complexity of the algorithms. More specifically, one maycontemplate that the number of data points, input features, rulesand type of algorithms (i.e., iterative or recursive, batch learningor single pass learning) used in the learning process affect theoverall computational efforts needed. Arguably, the computa-tional overhead of the algorithms can be defined as a resultantcost of learning pillars in the learning costs (i.e., fuzzy rulerecruitment, rule pruning strategy, adaptations of input andoutput parameters) as each learning constituent endure standa-lone computational burden depending on the number of data,rule, input attributes capitalized so that the resultant cost is thesummation of them.

On the one side, it is worth-stressing that the most influentialimpact in the computational burden of the DPFNN is indicated inthe adjustment of the consequent parameters. The primal stand-point is that DPFNN necessitates the snapshot of last q datasamples. Hence, the computational complexity arising in thislearning step is O(UqM), where M¼(uþ1)� r. u and r are thenumber of input features and rules, respectively, whereas U is thenumber of times the rule base expansion is executed. One maygrasp that O(�) is a big O notation1 which is ubiquitous inexploring the computational complexity of the algorithms. Thisexpense can be considered manageable as the total amount doesnot grow exponentially with the number of episodes or datasamples n. It is conceivable that the number of episodes n isusually higher than q nbq (this is justifiable by means of ourempirical study). On the other side, the computational expensesof the other learning modules like fuzzy rule extraction, pruningof inconsequential rules, adjustments of the input parameters aremuch less than the adaptation of the output parameters. Thesensible rationale is that these learning modules are consum-mated on-the-fly forgoing the past training stimuli. Virtually, theadaptation of input parameters, fuzzy rule growing and pruningdeplete the computational expenses in the order of O(r), O(2), andO(r), respectively. Accordingly, the DPFNN bears the total compu-tational load as follows:

O UqMþ2rþ2ð Þ ð17Þ

In comparison with such approaches like DFNN, GDFNN,SOFNN, or FAOS PFNN, the computational cost of our incipientalgorithm is more economical as aforesaid algorithms completelyrevisit the past data points in order to learn the newest datum.Nonetheless, the computational complexities of the DFNN andGDFNN are tantamount as they enrobe the same learning com-ponents. The major contributors in the DFNN and GDFNN learningalgorithms are LS method so as to craft the output parameters andERR method in order to oversee the superfluous fuzzy rules whichare quiet demanding. These two methods navigate to the compu-tational complexities in the order of O(2n2M2). Apart from that,the computational cost incurred in quantifying the potential ofthe training data is equivalent to the DPFNN which is O(2). Hence,we deduce the resultant computational costs of DFNN and GDFNN

Table 1Computational load, memory requirement, structural cost.

DPFNN DFNN GDFNN SOFNN FAOS–PFNN

Computational load O(UqM2þ2rþ2) O(2M2n2

þ2) O(2M2n2þ2) O(2M2nUþ2) O(r2n2

þ2þr)

Memory requirement O(qþMþ2u� r) O(nþMþu� rþr) O(nþMþu� rþr) O(nþMþu� rþr) O(nþu� rþ2r)

Structural cost O((2u� r)þ(uþ1)� r) O((uþ1)�2r) O((2u� r)þ(uþ1)� r) O((2u� r)þ(uþ1)� r) O((uþ1)� rþr)


are O(2n2M2þ2). Conversely, the computational expense of

SOFNN is more affordable than the formers as it possesses therecursive least square (RLS) method to polish up the outputparameters and optimal brain surgeon (OBS) method as rulepruning ingredient heading to O(2M2rU). Opposed to DPFNN, Uin SOFNN is the number of times a fuzzy rule is recruited andevicted. Indeed, the term U in SOFNN renders it more expensivethan in DPFNN as every time the rule base amends its size (i.e.,expansion or simplification) the adaptation of the consequentparameters benefiting all collected data points ought to beenforced. Meanwhile, the rule growing procedure of SOFNN isthe same as DFNN and GDFNN conveying the total computationalburden O(2M2nUþ2). In contrast, FAOS–PFNN is not endued byrule pruning adornment, however, this algorithm refurbishes thetraditional ERR method as another cursor of rule base augmenta-tion. As with DFNN, GDFNN, the ERR in FAOS–PFNN invokessevere computational efforts O(n2r2) as it collects the alreadylearned training signal. In this viewpoint, the resultant computa-tional complexity is O(n2r2

þ2þr) as FAOS–PFNN utilizes twoother criteria in addition to ERR and the EKF method in adaptingthe output parameters.

In addition to the computational complexity, the memoryrequirement of the model plays a noteworthy role in the viabilityof the algorithms. One may envision that the memory require-ment of DPFNN is in the order of O(qþMþ2u� r). Arguably, thecomputational cost of DPFNN is lighter than of DFNN, GDFNN,FAOS–PFNN as the use of preceding training data overtime isneedless.

In essence, the memory requirement of DFNN is in the orderof O(nþMþu� rþr) whereas the GDFNN, SOFNN land on thememory requirement O(nþMþu� rþr). One may grasp that thedistinction on memory requirement in GDFNN and DFNN isinflicted by DFNN is wrapped by the uni-dimensional member-ship function generating the same fuzzy region per input attri-bute. Meanwhile, the memory requirement of FAOS–PFNN is inthe order of O(nþu� rþ2r) which is not tantamount with DPFNNas FAOS–PFNN benefits from the singleton type consequences.

3.6. Analysis of structural complexity

An in-depth look, the structural cost of the FNN is emanated bythe total number of network parameters (input and outputparameters) stored in the memory. In retrospect, the number ofrules and the network specification are decisive to portray thelevel of complexity of the model. In a nutshell, DPFNN enrobes themultidimensional membership function on the premise part andthe first order polynomial on the output parameters. Therefore,we deduce the structural load of DPFNN is O((2u� r)þ(uþ1)� r)in which this load is comparable with the structural cost ofFLEXFIS, GDFNN and SOFNN. One may be cognizant that DPFNNconsumes lighter memory requirement and computational loadthan GDFNN and SOFNN, however, DPFNN labors the equivalentstructural complexity. This is emanated that DPFNN employs thecommensurate network type drawing the comparable networkparameters. It is worth-stressing that we do not reckon thenumber of data points benefited during the training process inorder to gauge the structural complexity of the model.

Conversely, the rule base complexity of eTS, simp_eTS andDFNN is dissimilar with those foreshadowed as all of thesealgorithms exploit uni-dimensional membership function. Thatis, the structural cost of these algorithms is O((uþ1)�2r). Incontrast, FAOS–PFNN and SAFIS harness the uni-dimensionalmembership function and the constant type consequence wherethe computational burden is O((uþ1)� rþr). Table 1 explicatesthe consolidated computational complexities, memory require-ments and structural burdens of the algorithms aforementioned.

4. Simulation studies

The viability of the DPFNN prototype as a novel breakthroughto the field of data-driven modeling is experimentally validatedthrough miscellaneous benchmark problems employing syntheticand real world datasets. The problems consolidated herein do notonly feature nonlinear and uncertain properties, but also sufferfrom non-stationary ingredients, which are generally challengingto be overcome by the classical models. On the one side, thesynthetic dataset problem encompasses a time series predictionof the Mackey–Glass function. On the other side, the real worlddataset problems outline the tool wear forecasting of theball-nose end-milling process, and the auto MPG problem.Nonetheless, DPFNN is also benchmarked with state-of-the-artalgorithms in order to promote the efficacy of DPFNN contrastedwith its counterparts. In this viewpoint, it is tangible in ourexperimentations in which DPFNN not only emulates the versa-tility of its counterparts, but also marginalizes the superiority ofthe already published works.

4.1. Chaotic Mackey Glass (MG) time series prediction

This study case explores one of the classical benchmark problemintroduced by [35] which was originally proposed as a controlmodel of the production of the white blood cells. This problem istremendously employed in many literatures [10–12,14–16,20,21,30]due to chaotic time series in which its nonlinear oscillations isuniversally endorsed as a representation of various physiologicalprocesses. Furthermore, this problem is governed with the use of thefollowing mathematical model.

dxðtÞ

dt¼

bx t�tð Þ

1þx10x t�tð Þ�ax tð Þ ð18Þ

where a¼0.1, b¼0.2 and t¼17. The task is to forecast future valuesx(tþP) from past values The parameters of the underlined functionare assigned as P¼Dt¼85 and n¼4. Herewith, the non-lineardependence of this time series problem is regularized by thefollowing mathematical form:

x tþ85ð Þ ¼ f ½xðtÞ, x t�6ð Þ, x t�12ð Þ, x t�18ð Þ� ð19Þ

A total of 3000 training data are driven from the time intervalt¼201 to t¼3200 and embarks to DPFNN algorithm. After theaging process of the resultant rule base is settled, the resultantrule base is tested benefiting 500 unseen data points fromt¼5001 to t¼5500 in which the overall data patterns areproduced by an approximation of the fourth-order Range–Kutta

Table 2Mackey Glass problem 3000 training sample.

Algorithms Rule Testing NDEI Time Number of

network parameters

DENFIS 58 0.278 4.1 754

SAFIS 21 0.380 a 126

FLEXFIS 89 0.157 4.01 1157

eTS 99 0.356 3.8 1287

Simple eTS 21 0.378 3.54 210

eMG 58 0.139 a 754

BARTFIS 24 0.301 3.66 312

FAOS PFNN 44 0.1685 129.89 264

DFNN 20 0.1345 200.56 200

GDFNN 18 0.1445 190.67 234

DPFNN 11 0.0959 4.28 143

a The result is not listed in the original paper.

Table 3The auto MPG task.

Algorithm RMSEtest(std) RMSEtest(mean) Rule Time Number of

network parameters

GAP RBF 0.1144 0.1404 3.12 a 31.96

eTS 0.011 0.088 3.8 0.233 86.4

Simp_eTS 0.024 0.09 7.1 0.221 156.2

BARTFIS 0.032 0.085 6.6 0.255 116.96

DENFIS 0.031 0.073 5.1 0.264 112.2

FLEXFIS 0.013 0.056 4.6 0.244 101.2

DFNN 0.0879 0.0890 3.5 0.876 56

GDFNN 0.0676 0.0789 4.0 0.980 88

FAOS–PFNN 0.0321 0.0775 2.9 0.566 26.1

DPFNN 0.0579 0.0500 2.66 0.2713 58.52

a The result is not listed in the original paper.


method. To properly accomplish this study case, the designparameters of DPFNN are assigned as follows: emax¼0.5,emin ¼ 0:05, s0¼a0¼0.4, kerr¼0.005k¼1.1, kw¼2, q¼30. Itshould be envisaged that the design parameters can be elicitedfrom optimization techniques like the grid-search method [40].Moreover, DPFNN is benchmarked against RAN [2], SAFIS [21],FAOS PFNN [20], DENFIS [16], eTS [36], Simple_eTS [37], andFLEXFIS [38], eMG [65] and BARTFIS [66] on the withheldassessment set of 500 training samples in order to campaignthe superiority of DPFNN. Table 2 summarizes the consolidatedresults of all benchmarked systems in generalizing thevalidation data.

Inevitably, DPFNN outperforms the other models, which notonly showcases higher modeling accuracy but also lands on amore economical rule base than the other approaches breedingthe smallest number of rules. In addition, the training speed ofDPFNN may predate the training speed of FAOS–PFNN. This ismainly induced by that FAOS–PFNN subscribes to all past data,which incur a more expensive computational cost than DPFNN.In line with our computational complexity analysis outlinedin the previous section, DPFNN undergoes a more instantaneoustraining episode than DFNN and GDFNN. Both, DFNN andGDFNN, are even slower than FAOS–PFNN as they wield theoriginal LS and ERR methods hinging to all training samples inevery training cycle. Notwithstanding, eTS, simp_eTS, FLEXFIS,DENFIS and BARTFIS experience more rapid training processes,eTS, simp_eTS, DENFIS and FLEXFIS are sketchy which are notincluding a rule base simplification technology, conversely, thesealgorithms generate less accurate predictions than DPFNN. Unfor-tunately, the comparative result of training speed of othermethods (yet, we provide analyses of computational burden ofthe DPFNN and other models in the previous section)are unavailable by virtue of the exploitation of different computerenvironments in their original publications. Hence, theirresults are not comparable with DPFNN in our article. In conjunc-tion with the number of network parameters conserved in thememory, DPFNN can suppress the memory demand on thelowest level.

4.2. Fuel consumption prediction of automobiles

The auto MPG problem aims to foresee fuel consumptions ofautomobiles (Miles per gallon) based on 392 training patterns.The goal is to promote the versatility of DPFNN in addressing areal world engineering problems. Moreover, there are seven inputvariables which are transmitted to DPFNN (displacement, horse-power, weight, acceleration, cylinders, model year and origin).A total of 320 training data and 72 testing data are permuted from

the auto mpg database. To favor representative experimentalresults, a simulation is repeatedly performed 50 times afterwardsdue to shuffling natures of the training and testing data points (Asevery trial may showcase different results) and the eventualresult is obtained from an averaged result of 50 trials. Further-more, the DPFNN is contrasted against RAN [2], MRAN [2] GAPRBF [6], and eTS [36], simp_eTS [37], BARTFIS [66], FLEXFIS [38]and DENFIS [16] algorithms in order to emulate the robustness ofthe DPFNN counterparts. The average results of 50 trials in thewithheld evaluation set of the 72 data points are tabulated byTable 2.

Table 3 summarizes that the other models are inferior toDPFNN in terms of predictive accuracy. On the one hand, DPFNNis able to eradicate the convoluted rule base conferring the mostcompact and parsimonious structure splitting the smallest num-ber of rules. On the other hand, its resultant fuzzy rule alsodelivers the best predictive fidelity. Albeit GAP–RBF, FAOS–PFNN,and DFNN gain the smaller number of network parameters thanDPFNN, the modeling accuracy of these approaches is muchworse than that of DPFNN. Nonetheless, this fact is occasionedby the fact that GAP–RBF constitutes neural network whereasDFNN and FAOS–PFNN conveys the uni-dimensional membershipfunction bestowing the same fuzzy regions per input attributes.This phenomenon is supported by their natures, which excludethe rule/hidden node simplification. In contrast, eTS, simp_eTS,DENFIS, FLEXFIS, BARTFIS hold milder computational burden thanDPFNN. However, they are inferior to DPFNN in terms of struc-tural burden and predictive quality.

4.3. Tool wear prediction of ball nose end milling process

Tool condition monitoring and prediction plays a vital role inhigh speed machining processes [33]. Undetected or pre-maturetool failures often impose to costly scrap or rework arising fromdamaged surface finishing, and loss of dimensional accuracy orpossible damage to the work piece and machine [50,51]. Morespecifically, in the high precision machining industry, the devel-opment of self adjusting and integrated system capable ofmonitoring performance degradation and work piece integrityunder various operational conditions under minimum operator’ssupervision is desirable [46]. However, a production of accuratetool wear predictions is quiet challenging by virtue of thenonlinear and uncertain natures of machining processes [34].New theories on machine learning have shed some lights tothese issues. The principal constituents, which are necessitatedtheir existence to concurrently address such issues, include theuse of fuzzy logic reasoning in handling imprecise data andelevating the level of human interpretability [48] and the

Table 4The tool-wear prediction of the ball nose end milling process.

Algorithm APEtest(std)

(%)

APEtest(mean)

(%)

Rule Time (s) Number of

network

parameters

GDFNN 0.179 10.75 4 3.59 52

DFNN 0.098 4.43 8.9 7.83 89

FAOS PFNN 0.084 20.81 8.2 0.61 49.2

BARTFIS 0.008 5.01 9.2 0.45 119.6

eTS 0.012 5.1 10 0.41 130

Simp_eTS 0.02 6.67 12.5 0.4 162.5

FLEXFIS 0.009 5.05 8.5 0.44 110.5

DENFIS 0.03 6.23 9.5 0.55 123.5

DPFNN 0.079 4.77 8.4 0.60 87.2


learning ability of neural network [49] in associating the inputand target data.

A CNC milling process (R +oders Tech RFM760) with a spindlerate up to 42,000 RPM is selected for the experiments. At thebeginning of empirical study, the raw signal is gathered by sevenchannels DAQ. The first three channels provide force signal in thethree dimensional cutting axes (X,Y,Z) measured by the dynam-ometer, the next three channels constitute vibration signalscaptured by the Accelerometer, and the last one produces theAE (acoustic emission) signals received by the AE sensor. It isconceivable that, in the machining process, many parameters andvariables affect the work piece integrity as well as the toolperformance over the production regime. As a reciprocal impact,it is deemed necessary for researchers to install a suit of accel-erometer, dynamometer, acoustic sensors at critical locations inorder to allow in-situ signals to be captured, processes, analyzed,and transformed into useful reference models for condition andperformance monitoring [45,47,48].

A total of sixteen features of force signal were extracted. Aspointed out by [29,32], it was recommended to merely encom-pass four features which are the most correlative with the toolwear. They are maximum absolute force, amplitude of force,average force, and amplitudes ratio. The dataset of two differentcutter profiles are collected, normalized and permuted. Thenumber of data points is 630 pairs and a 10-fold cross validationtechnique introduced by [40] is exploited in order to assess theperformance of DPFNN due to shuffling nature of the injectedtraining patterns. Referring to cross validation (CV) technique,the data set is first shuffled and partitioned into ten mutuallyexclusive bins, labeled as CV1–CV10. In the first trial, CV1 is usedas the testing set, whereas, CV2–CV10 constitute the testing set;in the second trial, CV2 is the testing set while CV1 and CV3–CV10 are the training set, and so on. The average results across10-fold cross-validation are tabulated by Table 4. To yielda robust tool wear predictor, which exemplifies a synergybetween the economical model complexity and the high pre-dictive accuracy, the predefined parameters of DPFNN should beallocated as follow: emax¼1, emin ¼ 0:01, s0¼a0¼2, kerr¼0.01,k¼1.1, kw¼1.12 and q¼30. In this experiment, DPFNN isbenchmarked against DFNN [10,11], FAOS PFNN [20] and GDFNN[13] BARTFIS [66], eTS [36], simp_eTS [37], FLEXFIS [38] andDENFIS [16], respectively.

In this empirical study, DPFNN outperforms the other methodsin term of training speed. Notwithstanding FAOS–PFNN may yielda competitive training speed and the lowest memory require-ment, it is featureless as it is unfurnished by the rule pruningadornment and engages the uni-dimensional membership func-tion and the constant output parameters. For brevity, it is wellknown that the multi-dimensional membership function inflicts a

more appealing property whose axes are not necessarily parallelto input variable axes and the TSK fuzzy system confers thehigher degree of freedom than the singleton fuzzy system. Byextension, the modeling accuracy of FAOS–PFNN is worse thanthat of DPFNN. Vice versa, GDFNN may confer the simplest rulebase proliferating the smallest number of rules in this occasion.However, its predictive accuracy and training speed are obviouslythe worst result. Despite the best modeling accuracy produced byDFNN, it suffers from a high computational burden owing to acollection of all embarked data, which is tangible to be one ofproblematic natures of DFNN. On the contrary, this hypothesis isstrengthened by the training speed of DFNN which yields theslowest execution time. Moreover, DFNN wraps the complexmodel representation in terms of the numbers of generated rulesand network parameters conserved in the repository which areusually unacceptable in the data-driven modeling field. In com-parison with BARTFIS, eTS, simp_eTS, DENFIS, FLEXFIS, our algo-rithm excels these methods in terms of the structural complexityand predictive quality. Yet, DPFNN is inferior to those algorithmsin terms of the training speed.

5. Conclusion

An exposure of a novel fuzzy neural network namely dynamicparsimonious fuzzy neural network (DPFNN) has been elaboratedin this paper as a promising candidate of the data-driven model-ing tools. In this context, the viability and efficacy of DPFNN areexemplified in which the experimental results utilizing variousreal-world and synthetic datasets are encouraging, especially inthe structural complexity and predictive accuracy view points. Asa downside, DPFNN benefits a sliding window-based least square(SWLS) or time localized least square (TLLS) method in order toderive the consequent parameters of DPFNN, which necessitates anumber of training patterns in the sliding window. As our futurework, we focus in enhancing DPFNN, so that it can run withoutstoring several data points in a moving window thereby expedit-ing the rehearsal process. Hence, all training procedures areexecuted without a priori domain knowledge of next data blocksand strictly do not require to look-back to already recognizedtraining stimuli.

Acknowledgements

This research is supported by the AnSTAR Science andEngineering Research Council Singapore-Poland Grant. Theauthors would like to thank the Singapore Institute of Manufac-turing Technology for kindly providing the tool wear data. Thefifth author acknowledges the Austrian fund for promotingscientific research (FWF, contract number I328-N23, acronymIREFS).

References

[1] L.X. Wang, J.M. Mendel, Fuzzy basis functions, universal approximation, andorthogonal least-squares learning, IEEE Trans. Neural Networks 3 (1992)807–814.

[2] J. Platt, A resource allocating network for function interpolation, NeuralComput. 3 (1991) 213–225.

[3] M. Salmeron, J. Ortega, C.G. Puntonet, A. Prieto, Improved RAN sequentialprediction using orthogonal techniques, Neurocomputing 41 (2001)153–172.

[4] L. Yingwei, N. Sundararajan, P. Saratchandran, Performance evaluation of asequential minimal radial basis function (RBF) neural network learningalgorithm, IEEE Trans. Neural Networks 9 (1998) 308–318.

[5] H.K. Saman, Self evolving neural network for rule base data processing, IEEETrans. Signal Process. 45 (1997) 2766–2773.


[6] G.-B. Huang, P. Saratchandran, N. Sundararajan, An efficient sequentiallearning algorithm for growing and pruning RBF (GAP–RBF) networks, IEEETrans. Syst. Man Cybern. Part B Cybern. 34 (2004) 2284–2292.

[7] G.-B. Huang, P. Saratchandran, N. Sundararajan, A generalized growing andpruning RBF (GGAP–RBF) neural network for function approximation, IEEETrans. Neural Networks 16 (2005) 57–67.

[8] L.A. Zadeh, Soft computing and fuzzy logic, IEEE Softw. 11 (1994) 48–56.[9] J.-S.R. Jang, ANFIS: adaptive-network-based fuzzy inference system, IEEE

Trans. Syst. Man Cybern. Part B Cybern. 23 (1999) 665–684.[10] S. Wu, M.J. Er, Dynamic fuzzy neural networks—a novel approach to function

approximation, IEEE Trans. Syst. Man Cybern. Part B Cybern. 30 (2000)358–364.

[11] M.J. Er, S. Fast, Learning algorithm for parsimonious fuzzy neural network,Fuzzy Sets Syst. 126 (2002) 337–351.

[12] S. Chen, C.F.N. Cowan, P.M. Grant, Orthogonal least squares learning algo-rithm for radial basis function network, IEEE Trans. Neural Networks 2 (1991)302–309.

[13] S.-Q. Wu, M.J. Er, Y. Gao, A fast approach for automatic generation of fuzzyrules by generalized dynamic fuzzy neural networks, IEEE Trans. Fuzzy Syst.9 (2001) 578–594.

[14] Y. Gao, M.J. Er, NARMAX time series model prediction: feedforward andrecurrent fuzzy neural network approaches, Fuzzy Sets Syst. 150 (2005)331–350.

[15] C.F. Juang, C.T. Lin, An on-line self-constructing neural fuzzy inferencenetwork and its applications, IEEE Trans. Fuzzy Syst. 6 (1998) 12–32.

[16] N. Kasabov, Q. Song, DENFIS: dynamic evolving neural-fuzzy inferencesystem and its application for time series prediction, IEEE Trans. Fuzzy Syst.10 (2002) 144–154.

[17] G. Leng, T.M. McGinnity, G. Prasad, An approach for on-line extraction offuzzy rules using a self-organising fuzzy neural network, Fuzzy Sets Syst. 150(2005) 211–243.

[18] G. Leng, G. Prasad, T.M. McGinnity, An on-line algorithm for creatingself-organizing fuzzy neural networks, Neural Networks 170 (2004)1477–1493.

[19] G. Leng, T.M. McGinnity, G. Prasad, Design for self organizing fuzzy neuralnetwork based on genetic algorithm, IEEE Trans. Fuzzy Syst. 14 (2006)755–766.

[20] N. Wang, M.J. Er, M.X. Fast, Accurate self organizing scheme for parsimoniousfuzzy neural network, Neurocomputing 72 (2009) 3818–3829.

[21] H.J. Rong, N. Sundararajan, G.B. Huang, P. Saratchandran, Sequential adaptivefuzzy inference system (SAFIS) for nonlinear system identification and timeseries prediction, Fuzzy Sets Syst. 157 (2006) 1260–1275.

[22] Y. Zhou., M.J. Er, A novel approach for generation of fuzzy neural networks,Int. J. Fuzzy Syst. 7 (2007) 8–13.

[23] M.J. Er, Y. Zhou, Automatic generation of fuzzy inference systems viaunsupervised learning, Neural Networks 21 (2008) 1556–1566.

[24] M.J. Er, S. Wu, Y. Gao, Dynamic Fuzzy Neural Networks: Architectures,Algorithms and Applications, McGraw-Hill, NY, USA, 2003.

[25] L. Wang, Fuzzy systems are universal approximators, in: Proc. InternationalConference on Fuzzy Systems, (1992), pp. 1163–1169.

[26] T. Kohonen, Self-organized formation of topologically correct feature maps,Biol. Cybern. 43 (1982) (1982) 59–69.

[27] M.J. Er, et al., Adaptive noise cancellation using enhanced dynamic fuzzyneural networks, IEEE Trans. Fuzzy Syst. 13 (2005) 331–342.

[28] W.L. Tung, C. Quek, eFSM-A novel online neural-fuzzy semantic memorymodel, IEEE Trans. Neural Networks 21 (2010) 136–157.

[29] S. Huang, X. Li, O.P. Gan, Tool wear estimation using SVM in ballnose end milling, IEEE Annual Conference of The Prognostic and Health Society,2010.

[30] M. Sugeno, G.T. Kang, Structure identification of fuzzy model, Fuzzy Sets Syst.28 (1988) 15–33.

[31] C.S. Leung, K.W. Wong, P.F. Sum, L.W. Chan, A pruning method for therecursive least squared algorithm, Neural Networks 14 (2001) 147–174.

[32] J.H. Zhou, C.K. Pang, F.L. Lewis, Z.W. Zhong, Intelligent diagnosis andprognosis of tool wear using dominant feature identification, IEEE Trans.Ind. Inf. 5 (2009) 454–464.

[33] A.G. Rehorn, J. Jiang, P.E. Orban, State-of-the-art methods and results in toolcondition monitoring: a review, Int. J. Adv. Manuf. Technol. 26 (2005)693–710.

[34] C.C. Lee, Fuzzy logic in control systems: fuzzy logic controller, IEEE Trans.Syst. Man Cybern. Part B Cybern. 20 (1990) 404–436.

[35] M.C. Mackey, L. Glass, Oscillation and chaos in physiological control systems,Science 197 (1977) 287–289.

[36] P. Angelov, D. Filev, An approach to online identification of Takagi–Sugenofuzzy models, IEEE Trans. Syst. Man Cybern. Part B Cybern. 34 (2004)484–498.

[37] P. Angelov, D. Filev, Simpl_eTS: A simplified method for learning evolvingTakagi–Sugeno fuzzy models, in: IEEE International Conference on FuzzySystems (FUZZ), 2005, pp. 1068–1073.

[38] E. Lughofer, FLEXFIS: a robust incremental learning approach for evolvingTakagi–Sugeno fuzzy models,, IEEE Trans. Fuzzy Syst. 16 (2008) 1393–1410.

[39] W. Ning, M.J. Er, M. Xian-Yao, X. Li, An online self organizing scheme forparsimonious and accurate fuzzy neural networks, Int. J. Neural Syst. 10(2010) 389–403.

[40] M. Stone, Cross-validatory choice and assessment of statistical predictions,J. R. Stat. Soc. 36 (1974) 111–147.

[41] P. Angelov, Evolving rule-based models: a tool for design flexible adaptivesystems, in: The Series Studies in Fuzziness and Soft Computing. HeidelbergGermany: Springer, Physics-Verlag, 2002, vol. 92.

[42] J.S.R. Jang, C.T. Sun, Functional equivalence between radial basis functionnetworks and fuzzy inference systems, IEEE Trans. Neural Networks 4 (1993)156–159.

[43] W. Pedrycs, An identification algorithm in fuzzy relational system, Fuzzy SetsSyst. 13 (1984) 153–167.

[44] T. Takagi, M. Sugeno, Fuzzy identification of systems and its application tomodeling and control, IEEE Trans. Syst. Man Cybern. Part B Cybern. 15 (1985)116–132.

[45] P. Angelov, R. Buswell, Identification of evolving fuzzy rule based models,IEEE Trans. Fuzzy Syst. 16 (2002) 667–676.

[46] X. Li, M.J. Er, B.S. Lim, J.H. Zhou, O.P. Gan, L. Rutkowski, Fuzzy regressionmodeling for tool performance prediction and degradation detection, Int. J.Neural Syst. 20 (2010) 405–419.

[47] B.Y. Lee, H.S. Liu, Y.S. Tarng, Modeling and optimization of drilling process,J. Mater. Process. Technol. 74 (1998) 149–157.

[48] K. Perusich, Using fuzzy cognitive maps to identify multiple causes introubleshooting systems, Integr. Comput. Aided Eng. 15 (2008) 197–206.

[49] J.M. Zurada, Introduction to Artificial Neural System, West PublishingCompany, USA, 1992.

[50] E. Haddadi, M.R. Shabghard, M.M. Ettefagh, Effect of different tool edgeconditions on wear detection by vibration spectrum analysis in turningoperation, J. Appl. Sci. 8 (2008) 3879–3886.

[51] L. Wang, M.G. Mehrabi, E.K. Jr, Tool wear monitoring in reconfigurablemachining systems through wavelet analysis, Trans. NAMRI 3 (2001) 399–406.

[52] E. Lughofer, Evolving Fuzzy Systems—Methodologies, Advanced Conceptsand Applications, Springer, Berlin Heidelberg, 2011.

[53] E.P. Klement, R. Mesiar, E. Pap, Triangular Norms, Kluwer Academic Publishers,Dordrecht Norwell New York London, 2000.

[54] J.A. Dickerson, B. Kosko, Fuzzy function approximations with ellipsoidal rules,IEEE Trans. Syst. Man Cybern. Part B Cybern. 26 (1996) 542–560.

[55] S. Abe, Fuzzy function approximators with ellipsoidal regions, IEEE Trans.Syst. Man Cybern. Part B Cybern. 29 (1999) 654–661.

[56] A. Lemos, W. Caminhas, F. Gomide, Multivariable Gaussian evolving fuzzymodeling system, IEEE Trans. Fuzzy Syst. 19 (2011) 91–104.

[57] C.S. Leung, K.W. Wong, P.F. Sum, L.W Chan, A pruning method for therecursive least squared algorithm, Neural Networks 14 (2001) 147–174.

[58] J.D.J. Rubio, SOFMLS: online self-organizing fuzzy modified least squaresnetwork, IEEE Trans. Fuzzy Syst. 17 (2009) 1296–1309.

[59] P.J. Werbos, The Roots of Backpropagation: From Ordered Derivatives toNeural Networks and Political Forecasting, Wiley, Hoboken, NJ, 1994.

[60] J. Abonyi, R. Babuska, F. Szeifert, Modified Gath–Geva fuzzy clustering foridentification of Takagi–Sugeno fuzzy models, IEEE Trans. Fuzzy Syst. 33(2002) 612–621.

[61] E. Lughofer, Extension of vector quantization for incremental clustering,Pattern Recognit. 41 (2008) 995–1011.

[62] Xi-Zhao Wang, Chun-Ru Dong, Improving generalization of fuzzy if-thenrules by maximizing fuzzy entropy, IEEE Trans. Fuzzy Syst. 17 (2009)556–567.

[63] Xi-Zhao Wang, Jun-Hai Zhai, Shu-Xia Lu, Induction of multiple fuzzy decisiontrees based on rough set technique, Inf. Sci. 178 (2008) 3188–3202.

[64] Xi-Zhao Wang, Chun-Ru Dong, Tie–Gang fan training T–S norm neuralnetworks to refine weights for fuzzy if-then rules, Neurocomputing 70(2007) 2581–2587.

[65] A. Lemos, W. Caminhas, F. Gomide, Multivariable Gaussian evolving fuzzymodelling system, IEEE Trans. Fuzzy Syst. 19 (2011) 91–104.

[66] R.J. Oentaryo, M.J. Er, L. San, L.-Y. Zhai, X. Li, Bayesian ART-based fuzzyinference system: a new approach to prognosis of machining process, in: IEEEAnnual Conference of The Prognostic and Health Society, 2011.

[67] E. Lughofer, Flexible evolving fuzzy inference systems from data streams(FLEXFISþþ), in: M. Sayed-Mouchaweh, E. Lughofer (Eds.), Learning in Non-Stationary Environments: Methods and Applications, Springer, New York,2012, pp. 205–246.

[68] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed.,Prentice Hall Inc., Upper Saddle River, New Jersey, 1999.

Mahardhika Pratama was born in Surabaya, Indonesia.He received B.E. degree (First Class Honor) in ElectricalEngineering from the Sepuluh Nopember Institute ofTechnology, Indonesia, in 2010. At the same time, hewas awarded the best and most favorite final project bythe same institution. Mr. Pratama holds his Master ofScience (M.Sc.) degree in Computer Control and Auto-mation (CCA) from Nanyang Technological University,Singapore, in 2011. He currently pursues a PhD programin University of New South Wales, Australia. Mr. Pra-tama is a member of IEEE, IEEE Computational Intelli-gent Society (CIS) and IEEE System, Man and Cybernetic
Society (SMCS), and Indonesian Soft Computing Society
(ISC-INA). His research interests involve machine learning, computational intelli-gent, evolutionary computation, fuzzy logic, neural network and evolving adaptivesystems.


Meng Joo Er is currently a Professor with the Divisionof Control and Instrumentation, School of Electricaland Electronic Engineering (EEE), NTU. His researchinterests include control theory and applications, fuzzylogic and neural networks, computational intelligence,cognitive systems, robotics and automation, sensornetworks and biomedical engineering. He has authored5 books, 16 book chapters and more than 400 refereedjournal and conference papers in his research areas ofinterest. He served as the Editor of IES Journal onElectronics and Computer Engineering from 1995 to2004. Currently, he serves as the Editor-in-Chief of the
International Journal of Electrical and Electronic Engi-
neering and Telecommunications, an Area Editor of International Journal of IntelligentSystems Science and an Associate Editor of 11 refereed international journals, namelyInternational Journal of Fuzzy Systems, Neurocomputing, International Journal ofHumanoid Robots, Journal of Robotics, International Journal of Mathematical ControlScience and Applications, International Journal of Applied Computational Intelligenceand Soft Computing, International Journal of Fuzzy and Uncertain Systems, Interna-tional Journal of Automation and Smart Technology, International Journal of Model-ling, Simulation and Scientific Computing, International Journal of IntelligentInformation Processing and the Open Electrical and Electronic Engineering Journal.Furthermore, he served as an Associate Editor of IEEE Transactions on Fuzzy Systemsfrom 2006 to 2011 and a Guest Editor of International Journal of Neural Systems from2009 to 2010.

Xiang Li received her Ph.D. degree from NanyangTechnological University, Singapore in 2000, as wellas M.E. and B.E. degrees from Northeastern University,China, in 1987 and 1982, respectively. She has morethan 15 years of experience in research and applica-tions of data mining, artificial intelligence and statis-tical analysis, such as neural networks, fuzzy logicsystems, data clustering and multiple regressionmodeling.

Richard J. Oentaryo is currently a Research Fellow at theLiving Analytics Research Centre, Singapore Manage-ment University (SMU). Prior to joining SMU, he was aResearch Fellow at the School of Electrical and ElectronicEngineering, Nanyang Technological University (NTU),where he worked as part of the team that clinched theIES Prestigious Engineering Achievement Award 2011.He received his Ph.D. and B.E. (First Class Honor) fromthe School of Computer Engineering, NTU, in 2011 and2004, respectively. Upon his B.E. graduation, he wasawarded the Information Technology Management Asso-ciation Gold Medal cum Book Prize for the best Final
Year Project of the 2004 cohort. Dr. Oentaryo is a
member of the Institute of Electrical and Electronics Engineers (IEEE), IEEE Computa-tional Intelligence Society (IEEE-CIS), and Pattern Recognition and Machine Intelli-gence Association (PREMIA), Singapore. His research interests span neuro-fuzzysystems, social network mining, and brain-inspired architectures. He has publishedover 15 international journal and conference papers, and received several awardssuch as the IEEE-CIS Outstanding Student Paper Travel Grant in 2006 and 2009.

Edwin Lughofer received his Ph.D. degree from theDepartment of Knowledge-Based Mathematical Sys-tems, University Linz, where he is now employed aspost-doctoral fellow. During the past 10 years, he hasparticipated in several international research projects,such as the EU-projects DynaVis: www.dynavis.org,AMPA and Syntex (www.syntex.or.at). In this period,he has published around 70 journal and conferencepapers in the fields of evolving fuzzy systems, machinelearning and vision, clustering, fault detection, imageprocessing and human–machine interaction, includinga monograph on ‘Evolving Fuzzy Systems’ (Springer,
Heidelberg) and an edited book on ‘Learning in Non-
stationary Environments’ (Springer, New York). He is associate editor of theinternational journals Evolving Systems (Springer) and Information Fusion(Elsevier), and organized various special sessions and issues in the field of evolvingsystems, incremental machine learning and on-line modeling.

He served as programme committee member of several international confer-ences and is currently a member of the ‘ETTC task force on Machine Learning’, andof the ‘EUSFLAT Working Group on Learning and Data Mining’. In 2010 he initiatedthe bilateral FWF/DFG Project ‘Interpretable and Reliable Evolving Fuzzy Systems’and is currently key researcher in the national K-Project ‘Process AnalyticalChemistry (PAC)’ (18 partners) as well as in the long-term strategic researchprojects ‘Condition Monitoring with Data-Driven Models’ and ‘PerformanceOptimization of Electrical Drives’ within the Austrian Competence Center ofMechatronics.

Imam Arifin graduated in electronic engineering fromElectronic Engineering Polytechnic Institute—ITS Sura-baya, in 1994. He received bachelor degree in ControlSystem Engineering—Electrical Engineering fromSepuluh Nopember Institute of Technology—Surabaya,in 2000 and master degree in Intelligent System andControl from School of Electrical Engineering andInformatics—Bandung Institute of Technology, in 2008.From 1994 to 1996, he joined the Auto Insertion Division,PT Sony Electronics Indonesia as programmer for Numer-ical Control machine. He is presently as a young lecturerof Control System Engineering at Electrical Engineering
Department, Sepuluh Nopember Institute of Technology.
www.syntex.or.at

Date post:	20-Dec-2016
Category:	Documents
Upload:	imam
View:	213 times
Download:	1 times

Data driven modeling based on dynamic parsimonious fuzzy neural network

Documents