+ All Categories
Home > Documents > Online Modeling With Tunable RBF Network

Online Modeling With Tunable RBF Network

Date post: 11-Oct-2016
Category:
Upload: xia
View: 219 times
Download: 3 times
Share this document with a friend
13
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS 1 Online Modeling With Tunable RBF Network Hao Chen, Yu Gong, and Xia Hong Abstract—In this paper, we propose a novel online modeling algorithm for nonlinear and nonstationary systems using a radial basis function (RBF) neural network with a fixed number of hidden nodes. Each of the RBF basis functions has a tunable center vector and an adjustable diagonal covariance matrix. A multi-in- novation recursive least square (MRLS) algorithm is applied to update the weights of RBF online, while the modeling performance is monitored. When the modeling residual of the RBF network becomes large in spite of the weight adaptation, a node identified as insignificant is replaced with a new node, for which the tunable center vector and diagonal covariance matrix are optimized using the quantum particle swarm optimization (QPSO) algorithm. The major contribution is to combine the MRLS weight adaptation and QPSO node structure optimization in an innovative way so that it can track well the local characteristic in the nonstationary system with a very sparse model. Simulation results show that the proposed algorithm has significantly better performance than existing approaches. Index Terms—Multi-innovation recursive least square (MRLS), nonlinear, nonstationary, online modeling, quantum particle swarm optimization (QPSO), radial basis function (RBF). I. I NTRODUCTION C ONVENTIONAL dynamic modeling is based on assump- tions on linearity and stationarity of the underlying sys- tems [1], [2]. In practice, many systems exhibit nonlinear and nonstationary behaviors, for which an adaptive nonlinear model is often needed. Unlike offline modeling methods utilizing the whole batch of data, the online approach keeps adjusting the model using the incoming data so that the changing behavior of the nonstationary system is captured by the model. Online modeling for nonstationary and nonlinear systems is usually a difficult task. A common approach is to use adaptive algorithms to track the temporal variation of the system. Both linear and nonlinear adaptive approaches have been proposed, with typi- cal examples including the time-varying autoregressive-moving average with exogenous terms [3] and time-varying autoregres- sive with exogenous terms [4] for the linear approaches, and time-varying neural network [5] for the nonlinear approaches. In some cases, the associated time-varying parameters of a Manuscript received December 15, 2011; revised June 5, 2012; accepted September 1, 2012. This work was supported by the UK Engineering and Physical Sciences Research Council and DSTL under Grant EP/H012516/1. This paper was recommended by Associate Editor S. X. Yang. H. Chen and X. Hong are with the School of Systems Engineering, University of Reading, Reading, West Berkshire RG6 6UR, UK (e-mail: [email protected]; [email protected]). Y. Gong is with the School of Electronic, Electrical and Systems Engineer- ing, Loughborough University, Loughborough, Leicestershire LE11 3TU, UK (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCB.2012.2218804 nonstationary system can be expanded by a series of basis functions, and the nonstationary modeling is simplified to the time-invariant parameter estimation. For instance, Legendre and Walsh basis functions are used for smooth and abrupt changing nonstationary signals, respectively [6]. However, such approaches are for specific model structures based on some a priori knowledge of the systems, which is clearly not suitable for all nonstationary systems in practice. A large class of nonlinear systems can be modeled using the linear-in-the-parameter model. A popular choice of such models is the radial basis function (RBF) neural network due to simplicity and the ability to approximate any continuous function to an arbitrary degree of accuracy [7]. In the online modeling, the weights of the RBF network are adapted by linear learning algorithms such as the least mean square (LMS) [8], Givens least square [9], and recursive least square (RLS) algorithms [10]. Recently, a variant RLS algorithm, namely, the multi-innovation RLS (MRLS) [11]–[13], has been proposed. Unlike the classic RLS algorithm which only considers the current residual error, the MRLS adaptation is based on a number of recent errors, making it particularly robust against noise. The online modeling RBF approaches have been well re- searched. This includes the resource allocating network (RAN) [14], where the network model starts from empty and grows with the input data based on the nearest neighbor method. The RAN extended Kalman filter (RAN-EKF) algorithm improves the RAN by replacing the LMS with the extended Kalman filter to adjust the network parameters [15]. Both RAN and RAN- EKF algorithms only grow the network size without a pruning strategy to remove the obsolete RBF nodes, so the model size can be too large in some applications. Hence, in [16], an improved approach of the RAN algorithm was proposed by limiting the size of the RBF network (L-RAN). Further in [17] and [18], a more compact model can be achieved by using the minimal RAN algorithm which prunes the inactive kernel nodes based on relative contribution. All of these algorithms need to carefully predetermine many controlling parameters in order to achieve satisfactory performance. More computation- ally efficient growing-and-pruning RBF (GAP-RBF) algorithm [19] and the generalized GAP-RBF algorithm [20] were then proposed, in which only the nearest RBF node is considered for the model growing and pruning. The growing and pruning are based on the “significance” of the nodes, which has a direct link to the learning accuracy, and require some a priori information such as the input data range and distribution. In order to guarantee the model generalization, the kernel LMS algorithm was proposed in [21], in which the size of the model can grow based on the well-posedness analysis in reproducing kernel Hilbert spaces. While all of these RAN-based approaches can 1083-4419/$31.00 © 2012 IEEE
Transcript
Page 1: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS 1

Online Modeling With Tunable RBF NetworkHao Chen, Yu Gong, and Xia Hong

Abstract—In this paper, we propose a novel online modelingalgorithm for nonlinear and nonstationary systems using a radialbasis function (RBF) neural network with a fixed number ofhidden nodes. Each of the RBF basis functions has a tunable centervector and an adjustable diagonal covariance matrix. A multi-in-novation recursive least square (MRLS) algorithm is applied toupdate the weights of RBF online, while the modeling performanceis monitored. When the modeling residual of the RBF networkbecomes large in spite of the weight adaptation, a node identifiedas insignificant is replaced with a new node, for which the tunablecenter vector and diagonal covariance matrix are optimized usingthe quantum particle swarm optimization (QPSO) algorithm. Themajor contribution is to combine the MRLS weight adaptationand QPSO node structure optimization in an innovative way sothat it can track well the local characteristic in the nonstationarysystem with a very sparse model. Simulation results show thatthe proposed algorithm has significantly better performance thanexisting approaches.

Index Terms—Multi-innovation recursive least square (MRLS),nonlinear, nonstationary, online modeling, quantum particleswarm optimization (QPSO), radial basis function (RBF).

I. INTRODUCTION

CONVENTIONAL dynamic modeling is based on assump-tions on linearity and stationarity of the underlying sys-

tems [1], [2]. In practice, many systems exhibit nonlinear andnonstationary behaviors, for which an adaptive nonlinear modelis often needed. Unlike offline modeling methods utilizing thewhole batch of data, the online approach keeps adjusting themodel using the incoming data so that the changing behaviorof the nonstationary system is captured by the model. Onlinemodeling for nonstationary and nonlinear systems is usually adifficult task. A common approach is to use adaptive algorithmsto track the temporal variation of the system. Both linear andnonlinear adaptive approaches have been proposed, with typi-cal examples including the time-varying autoregressive-movingaverage with exogenous terms [3] and time-varying autoregres-sive with exogenous terms [4] for the linear approaches, andtime-varying neural network [5] for the nonlinear approaches.In some cases, the associated time-varying parameters of a

Manuscript received December 15, 2011; revised June 5, 2012; acceptedSeptember 1, 2012. This work was supported by the UK Engineering andPhysical Sciences Research Council and DSTL under Grant EP/H012516/1.This paper was recommended by Associate Editor S. X. Yang.

H. Chen and X. Hong are with the School of Systems Engineering,University of Reading, Reading, West Berkshire RG6 6UR, UK (e-mail:[email protected]; [email protected]).

Y. Gong is with the School of Electronic, Electrical and Systems Engineer-ing, Loughborough University, Loughborough, Leicestershire LE11 3TU, UK(e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCB.2012.2218804

nonstationary system can be expanded by a series of basisfunctions, and the nonstationary modeling is simplified to thetime-invariant parameter estimation. For instance, Legendreand Walsh basis functions are used for smooth and abruptchanging nonstationary signals, respectively [6]. However, suchapproaches are for specific model structures based on somea priori knowledge of the systems, which is clearly not suitablefor all nonstationary systems in practice.

A large class of nonlinear systems can be modeled usingthe linear-in-the-parameter model. A popular choice of suchmodels is the radial basis function (RBF) neural network dueto simplicity and the ability to approximate any continuousfunction to an arbitrary degree of accuracy [7]. In the onlinemodeling, the weights of the RBF network are adapted bylinear learning algorithms such as the least mean square (LMS)[8], Givens least square [9], and recursive least square (RLS)algorithms [10]. Recently, a variant RLS algorithm, namely, themulti-innovation RLS (MRLS) [11]–[13], has been proposed.Unlike the classic RLS algorithm which only considers thecurrent residual error, the MRLS adaptation is based on anumber of recent errors, making it particularly robust againstnoise.

The online modeling RBF approaches have been well re-searched. This includes the resource allocating network (RAN)[14], where the network model starts from empty and growswith the input data based on the nearest neighbor method. TheRAN extended Kalman filter (RAN-EKF) algorithm improvesthe RAN by replacing the LMS with the extended Kalman filterto adjust the network parameters [15]. Both RAN and RAN-EKF algorithms only grow the network size without a pruningstrategy to remove the obsolete RBF nodes, so the modelsize can be too large in some applications. Hence, in [16], animproved approach of the RAN algorithm was proposed bylimiting the size of the RBF network (L-RAN). Further in [17]and [18], a more compact model can be achieved by usingthe minimal RAN algorithm which prunes the inactive kernelnodes based on relative contribution. All of these algorithmsneed to carefully predetermine many controlling parameters inorder to achieve satisfactory performance. More computation-ally efficient growing-and-pruning RBF (GAP-RBF) algorithm[19] and the generalized GAP-RBF algorithm [20] were thenproposed, in which only the nearest RBF node is considered forthe model growing and pruning. The growing and pruning arebased on the “significance” of the nodes, which has a direct linkto the learning accuracy, and require some a priori informationsuch as the input data range and distribution. In order toguarantee the model generalization, the kernel LMS algorithmwas proposed in [21], in which the size of the model can growbased on the well-posedness analysis in reproducing kernelHilbert spaces. While all of these RAN-based approaches can

1083-4419/$31.00 © 2012 IEEE

Page 2: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS

identify a system model online with adjustable number of nodes(or the so-called model size), a common problem is that thestructure of each individual node is not optimized. Rather,the node structure is simply set based on the incoming data(or the data themselves). This makes the model size oftenincrease with the number of the sample data, ending up witha very large model having poor model generalization and highcomputational expense, particularly so for nonstationary sys-tems [22]. The RBF node selection and structure optimizationare thus of particular importance.

In fact, a popular approach for structuring the RBF networkmodel is to consider the training input data points as candidateRBF centers and to employ a common variance for every RBFnode. In this approach, the node selection can be performedusing the orthogonal least squares (OLS) algorithm and itsvariants including the regularized OLS (ROLS), locally ROLS(LROLS), and LROLS-leave-one-out [23]–[27]. Recently, atunable RBF model identification algorithm has been proposed[28], [29], where each RBF node has a tunable center vectorand an adjustable diagonal covariance matrix, and at eachforward regression stage, the associated center vector and di-agonal covariance matrix are optimized using particle swarmoptimization (PSO) algorithm. This provides an exceptionallyflexible RBF model in which the model size can be significantlyreduced. The PSO is a population-based stochastic optimizationtechnique inspired by social behavior of bird flocking or fishschooling [30]. The PSO method is becoming very popular dueto its simplicity in implementation, ability to quickly convergeto a reasonably good solution, and its ability to escape fromlocal minima. It has been successfully applied to a wide rangeof optimization problems [31], [32]. While the aforementionedapproaches provide a powerful way to automatically determinethe model structure of the RBF network, they are howeveroffline (batch) learning methods which are inadequate for on-line applications. This motivates us to propose a novel onlineRBF network in which the node structure can be adaptivelyoptimized.

An interesting alternative to the aforementioned approachesis the recently proposed extreme learning machine (ELM) andits variant [33]–[39], where the node structure optimization isavoided by using a very large number of nodes. Both offlineand online ELM approaches have been proposed. In the offlineELM [33], [34], [39], a large number of nodes are randomlygenerated at the beginning and are fixed during the learningprocess. While in the online approach [35]–[38], a relativelysmaller number of nodes are randomly applied at the trainingstage, but the node number can be adjusted during the learningstage depending on the incoming data. The online version canachieve similar performance as the offline approach but has lesscomplexity as it does not deal with the whole batch of data [38].While the ELM can achieve high accuracy with fast learningspeed in many applications, the model size may have to be verylarge especially for nonstationary systems so that the modelgeneralization is not guaranteed. In this paper, we focus on RBFnetworks whose structure can be adaptively optimized as it isparticularly suitable in nonstationary environments.

In this paper, we propose a novel online RBF network withtunable nodes. First, we understand that, in the nonstation-

ary system, because the input statistics keep on varying, the“local characteristic” of the input data is more relevant thanthe “global characteristic” [40], [41]. This implies that themodel size needs not be large since the modeling needs tofocus on the recent data but not the older ones. Therefore, wepropose to fix the model size at a small number, and each RBFnode has a tunable center vector and an adjustable diagonalcovariance matrix which can be optimized online one at a time.At each time step, the RBF weights are adapted using theMRLS [13], and the modeling performance is monitored. If theRBF network performs poorly despite the weight adaptation, aninsignificant node with little contribution to the overall systemis identified and replaced by a new node without changingthe model size. The structural parameters of the new nodeincluding the center vector and diagonal covariance matrixare optimized by using the quantum PSO (QPSO) algorithm.Unlike the original PSO algorithm [30], the QPSO does notprespecify a searching boundary and can ensure convergenceto the global minimum [42], making it particularly suitable forthe node structure optimization. Because the RBF network hastunable nodes, the model size can be much smaller than that ofa conventional RBF network due to its structural flexibility. Themain contributions of this paper are summarized as follows.

1) Propose a novel online RBF network which is fundamen-tally different from existing approaches. It has a fixedmodel size but tunable node structure. Simulation resultsshow that the proposed algorithm with a very sparsemodel has significantly better performance than existingapproaches especially in nonstationary environments.

2) Propose to use the QPSO algorithm for the node structureoptimization online.

3) Integrate seamlessly the MRLS weight adaptation andQPSO node optimization into one approach.

The rest of this paper is organized as follows. Section IIbriefly introduces the RBF neural network and describes theMRLS algorithm for the weight adaptation. Section III pro-poses a novel approach to optimize the node structure online.Section IV summarizes the proposed algorithm. Section Vcompares the proposed approach with some typical existingonline methods via numerical simulations. Finally, Section VIconcludes this paper.

II. RBF NETWORK WITH MRLS ADAPTATION

The adaptive RBF network is shown in Fig. 1, where thereare M hidden nodes, or the model size is M . At time t, theinput vector of the RBF network is given by

xt = [xt(1), xt(2), . . . , xt(Nx)]T (1)

where Nx is the model input dimension or the number of inputchannels and xt(i) is the input data from the ith input channelat time t.

The RBF network output is given by

f(xt) =

M∑i=1

wt−1(i)gi(xt) = φTt wt−1 (2)

Page 3: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN et al.: ONLINE MODELING WITH TUNABLE RBF NETWORK 3

Fig. 1. RBF model structure.

where gi(xt) is the output of the ith node, wt−1(i) is theweight coefficient for the ith node at time t− 1, wt−1 =[ωt−1(1), . . . , ωt−1(M)]T, and φt = [g1(xt), . . . , gM (xt)]

T.In this paper, the Gaussian RBF given by

gi(xt) = exp

(−1

2(xt − ci)

THi(xt − ci)

)(3)

is adopted, where ci = [ci(1), . . . ci(Nx)]T and Hi =

diag{σ2i (1), . . . , σ

2i (Nx)}, which are the center vector and

diagonal covariance matrix of the ith node, respectively,with ci(j) and σi(j) being the center and standard deviationcoefficients for the jth input channel, respectively. The residualerror of the RBF network at time t is given by

et = yt − φTt wt−1 (4)

where yt is the observed system output at time t.Originally described for the ARX model adaptation [11], the

MRLS adaptation is based on both current and past residualerrors. To be specific, putting p number of input vectors into aninput matrix gives

Xt = [xt,xt−1, . . . ,xt−p+1]T ∈ R

p×Nx (5)

where p is the innovation length which determines the numberof past errors used for weight adaptation. Passing Xt throughthe RBF nodes gives the information matrix as

Φt =

⎡⎢⎢⎣

g1(xt) g2(xt) . . . gM (xt)g1(xt−1) g2(xt−1) . . . gM (xt−1)

......

. . ....

g1(xt−p+1) g2(xt−p+1) . . . gM (xt−p+1)

⎤⎥⎥⎦

= [φt,φt−1, . . . ,φt−p+1]T

= [g1,g2, . . . ,gM ] ∈ Rp×M (6)

where gi is the ith column of Φt, which is the ith RBF regres-sor. Letting et = [et, et−1, . . . , et−p+1]

T and yt = [yt, yt−1,. . . , yt−p+1]

T, we have the vector/matrix expression of (4) as

et = yt −Φtwt−1. (7)

With Φt and et, the MRLS adaption rules are given by thefollowing steps:

Ψt =Pt−1ΦTt

[λIp +ΦtPt−1Φ

Tt

]−1(8)

Pt =(Pt−1 −ΨtΦtPt−1)λ−1 (9)

wt =wt−1 +Ψtet (10)

where Ψt∈RM×p is the Kalman gain matrix, P∈R

M×M is thecovariance matrix, Ip is the p× p identity matrix, and λ isthe forgetting factor, a positive number smaller than one. Pt

is usually initialized as P0 = δIM , where δ is a large constant.In this paper, the aforementioned MRLS algorithm is used to

update the node weights. Unfortunately, the MRLS by itself isnot adequate enough for nonstationary systems.

III. ONLINE NODE STRUCTURE OPTIMIZATION

The modeling performance of an RBF network is determinedby both the weight vector and node structure. For the GaussianRBF network, the node structure parameters include the centervector and covariance matrix of each node or ci and Hi

in (3), respectively. In many RBF network learning methods,the centers are either determined by the input data, such ask-means clustering approach [43], or simply by being set asthe input data (e.g., [21]). A common standard deviation isoften used for all nodes and set by the trial-and-error or cross-validation method [26]. Such choice of the RBF structure oftenleads to over large model size and poor performance in trackingthe system variation for the modeling of nonstationary systems.

In this paper, we propose to fix the model size at a smallnumber. The advantage of fixing the model size is to enablethe MRLS to be seamlessly integrated with the node structureoptimization. Because the changeable “local characteristic” isof primary importance in a nonstationary system, the nodestructure parameters need to be adaptive accordingly in casethe MRLS becomes inadequate. While the joint structure opti-mization for all nodes can be computationally prohibitive, wepropose to only replace one “insignificant” node with a newnode whose structural parameters are then optimized. To bespecific, if the current RBF network performs poorly despitethe weight adaptation with the MRLS, one “insignificant” nodewith little contribution to the overall performance is identifiedand replaced by a new node. The center vector and covariancematrix of the new node are optimized by the QPSO algorithmbased on the recent data, but the other nodes remain unchanged.Consequently, the RBF structure can be self-tuned to keeptracking the local characteristic of the nonstationary systemand, at the same time, to maintain the model complexity at amoderate level in order to minimize computational cost and toachieve fast tracking capability. In order to realize this strategy,it is essential to determine when the node replacement takesplace and how the structure of the new node is optimized, whichis discussed in detail in the following.

Page 4: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS

A. Node Replacement

When the RBF structure is not suitable for the current data,the network residual error becomes large, and one insignif-icant node with poor performance is replaced with a newnode. In order to prevent the node replacement from occurringtoo frequently, the “average” residual error is used to mea-sure the performance of the RBF network. Noting that themulti-innovation error vector et in (7) consists p number ofmost recent errors, the normalized “average” residual error isgiven by

e2t =1

p· ‖et‖

2

‖yt‖2. (11)

Then, we have the following criterion⎧⎨⎩

if e2t < Δ1, the RBF structure remains unchangedif e2t ≥ Δ1, an insignificant node is replaced with

a new node,(12)

where Δ1 is a constant threshold which is set according to theperformance requirement. In general, the smaller Δ1 is, thesmaller the residual error can achieve, but the more frequentlythe node replacement may occur.

If et ≥ Δ1, a node with little contribution to the overallsystem performance is replaced with a new node. It is knownthat the increment of error variance (IEV) can be used tomeasure the individual contribution from each node [44], [45].In order to calculate the IEV for the ith node, we rewrite (7) as

yt = Φtwt−1 + et = Φt,−iwt−1,−i + ωt−1(i)gi + et (13)

where Φt,−i is the new information matrix by removing the ithcolumn gi from Φt, wt−1,−i is wt−1 with the ith element beingremoved, and ωt−1(i) is the weight coefficient for the ith node.Orthogonally projecting gi onto the space spanned by columnvectors of Φt,−i gives

qi ={I−Φt,−i

[ΦT

t,−iΦt,−i

]−1ΦT

t,−i

}gi (14)

where I is the identity matrix with appropriate dimension. Then,(13) can be rewritten as

yt = Φt,−ivt−1,−i + wt−1(i)qi + et (15)

where vt−1,−i = [ΦTt,−iΦt,−i]

−1ΦT

t,−igi, which is the leastsquares estimation for the new information matrix Φt,−i. Then,we have wt−1(i) = qT

i yt/qTi qi. Because qi is orthogonal to

the space spanned by column vectors of Φt,−i, the IEV for theith node is given by

IEVi = ‖wt−1(i)qi‖2 = w2t−1(i)q

Ti qi. (16)

While a node with smaller IEV has less contribution to theoverall performance, we order the IEVs for all nodes as

IEV1′ ≤ IEV2′ ≤ · · · ≤ IEVM ′ (17)

where IEVi′ is for node i′ with the ith smallest IEV. Therefore,node 1′ has the least contribution to the overall performance andcan be replaced with a new node.

Fig. 2. IEV and WNV.

While the IEV comparison in (17) gives the most “insignif-icant” node or the node with least contribution to the overallperformance, its calculation suffers from high complexity andnumerical instability. This is because, as shown in (14), the IEVcalculation requires the matrix inversion of [ΦT

t,−iΦt,−i]−1

. In ahighly nonstationary system, it is possible that some nodes areso badly structured sometimes that the node outputs becomevery small. This makes ΦT

t,−iΦt,−i ill-conditioned, resulting innumerical instability.

Alternatively, we can use the weighted node-output variance(WNV) to determine which node should be replaced. The WNVfor the ith node is defined as

WNVi = ‖ωt−1(i)gi‖2 = ω2t−1(i)g

Ti gi (18)

where ωt−1(i)gi is the weighted output for the ith node. TheWNV for all nodes is ordered as

WNV1′ ≤ WNV2′ ≤ · · · ≤ WNVM ′ (19)

where WNVi′ is for node i′ with the ith smallest WNV. Then,from (19), node 1′ with the smallest WNV is replaced with anew node.

The relation between the IEV and WNV is shown in Fig. 2,where there are three nodes for illustration. In practice, sincethe nodes are often well “separated” to have good data cover-age, the correlation between node outputs is limited. This im-plies that, if IEVi � IEVj , we usually have WNVi � WNVj .Therefore, if there exist a group of L(1 ≤ L ≤ M) nodes withvery small WNV, they also have significantly smaller IEV thanthe other nodes so that they should all be replaced. In ourproposed online scheme where one node is replaced at a time,it is not important to determine which of these nodes with smallWNV is replaced first because other nodes with small WNV areto be replaced at a later time. In fact, the IEV criterion in (17)replaces the most “insignificant” node, and the WNV criterionin (19) chooses one of the “insignificant” nodes. While theylead to similar performance in online approach applications,the WNV criterion not only is more robust but also requiresmuch less complexity than its IEV counterpart. We have doneextensive simulations which all show that node replacementbased on (17) and (19) leads to similar results.

Page 5: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN et al.: ONLINE MODELING WITH TUNABLE RBF NETWORK 5

B. Iterative Node Structure Optimization andWeight Adaptation

When an insignificant node is replaced with a new node, thestructure (or the center vector and covariance matrix for theGaussian RBF node) of the new node needs to be optimizedbased on the recent data. Without losing generality, we assumethat the ith node is replaced. The structure optimization is tofind the best structural parameters of the new node gi(.) whichminimizes the cost function as

Jt=

t−p+1∑i=t

e2t (i)=eTt et=(yt−Φtwt−1)T(yt−Φtwt−1) (20)

with the structure for other nodes gj �=i(.) being unchanged,where the node structure determines how the information ma-trix Φt is formed. Since Jt is the square error summation forthe recent p inputs, the structure of the new node is optimizedfor the recent p data rather than the current data, making themodeling have better performance in generalization.

Because Jt depends on both the node structure and weightvector, when the structure of the new node is adjusted, theweight vector which is based on the previous structure shouldalso be modified. The joint optimization of the structure of thenew node and weight vector can be complicated. We understandthat, for a particular node structure, the weight vector can beadapted by the MRLS algorithm. On the other hand, when theweight vector is given, the structure of the new node can beoptimized by the QPSO searching algorithm [42] (which willbe discussed in detail later). This clearly suggests an iterativeapproach for the structure optimization and weight adaptation.To be specific, when an insignificant node is replaced by a newone, we have the following iterative steps.

1) Initialize the structure of the new node, and initialize theweight coefficient of the new node as zero and the weightsof other nodes as the those before the node replacement.Use the MRLS to update the weight vector by one step.

2) Fix the weight vector, and update the structure of the newnode with the QPSO algorithm by one step.

3) Fix the structure of the new node, and adapt the weightvector with the MRLS by one step.

4) Repeat steps 2) and 3) until the normalized average errorvariance is small enough that

Jt‖yt‖2

< Δ2 (21)

or the maximum iteration number is reached, where Δ2

is a preset constant depending on system requirement.We highlight that, in the aforementioned iterative process,

at every iteration, the MRLS weight adaption uses the samebatch of input data but with a different structure of the newnode obtained from the QPSO optimization. In contrast, whenno node is replaced, the RBF structure remains unchanged, andthe weight vector is adapted by the MRLS algorithm with newbatch of input data once at a time.

We particularly note that it is not appropriate to use the QPSOalgorithm to search for the weight vector and the new node’sstructure altogether. Unlike the node structural parameters as

will be shown later, the weight coefficients usually have alarge dynamic range with little direct link to the input data.It is thus hard to properly initialize the QPSO algorithm forfast convergence. As a result, a large number of “particles”and iterations may have to be used in the QPSO searching,leading to very slow convergence and high complexity. This isespecially serious in the nonstationary system where the weightcoefficients keep varying with time.

C. Structure Optimization With the QPSO

In this section, we first describe the QPSO algorithm forthe node structure optimization and then propose a data-drivenmethod to initialize the QPSO to achieve fast convergence.

1) QPSO: The QPSO is an evolutional searching methodincluding K number of particles. When the QPSO is used forthe structure optimization, every particle consists of Nx pairsof centers and standard deviation parameters of the new node,where Nx is the number of input channels defined in (1). To bespecific, at the lth iteration, the kth particle (k = 1, . . . ,K) isexpressed as

γ(k)t,l =

{c(k)t,l (1), . . . , c

(k)t,l (Nx), σ

(k)t,l (1), . . . , σ

(k)t,l (Nx)

}(22)

where c(k)t,l (j) and σ(k)t,l (j) are the center and standard deviation

coefficients for the jth input channel (j = 1, . . . , Nx), respec-tively, and the subscript t represents that the node replacementoccurs at time t. We particularly note that, unlike many existingRBF approaches, we do not assume σi(1) = · · · = σi(Nx). Asa result, different nodes may have different “width” of Gaussianfunctions for different input channels, which enables the nodeswith more flexibility in covering the data.

At every iteration, the particles move from one positionto another, where every particle position corresponds to onepossible structure of the new node. With the weight vector beingfixed from the MRLS adaption at the previous iteration, the costfunction value for the particle position γ

(k)t,l can be obtained as

Jt(γ(k)t,l ), k = 1, . . . ,K, where Jt(.) is defined in (20).

At the lth iteration, the best position for the kth particle(k = 1, . . . ,K) has the minimum Jt among all of its previouspositions as

pbest(k)t,l = argmin

{Jt

(γ(k)t,i

)| i = 0, . . . , l

}. (23)

The global best structure of the new node at the lth iteration isthe particle position with the minimum Jt as

gbestt,l = argmin{Jt

(pbest

(k)t,l

)| k = 0, . . . ,K

}. (24)

The local attractor for the kth particle (k = 1, . . . ,K) for itsnext iteration is given by

α(k)t,l = ϕ · pbest(k)t,l + (1− ϕ) · gbestt,l (25)

where ϕ is a randomly generated number between [0,1]. Then,the iteration rule for the kth particle (k = 1, . . . ,K) is given by

γ(k)t,l+1 = α

(k)t,l + ς · β ·

∣∣∣mbestt,l − γ(k)t,l

∣∣∣ · ln 1

μ(26)

Page 6: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS

where ς is randomly generated +1 or −1 with equal probability,μ is a randomly generated number between [0,1], β is theonly controlling parameter in the QPSO which is determinedaccording to specific applications, and mbestt,l is the gravitycenter of all particles’ best position at the lth iteration which isgiven by

mbestt,l =1

K

K∑k=1

pbest(k)t,l . (27)

2) QPSO Initialization: While the QPSO can achieve globalconvergence, the converging time depends greatly on the par-ticle initialization. If the particles are initialized far from theoptimum solutions, a large number of particles and iterationsmay have to be used, leading to slow convergence and highcomplexity. In the original QPSO, the particles are randomlyinitialized. This is not desirable for the structure optimizationespecially in nonstationary systems where the structure pa-rameters keep varying with time. In the following discussion,we propose a data-driven method for the particle initializationwhich can follow the “local” statistics of the input data.

In many applications, it is likely that the optimum centersare around the input data. Some approaches even choose thenode centers as the input data (e.g., [14], [15], and [17]–[20]),although this is not optimum. While the structure optimizationis based on the recent p input data, the center for the first particlecan be initialized as the average of the recent p inputs as

c(1)t,0 (j) =

1

p

t∑i=t−p+1

xi(j), j = 1, . . . , Nx (28)

where we recall that xi(j) is the jth channel input at time i.The centers for the remaining particles (k = 2, . . . ,K) are

randomly initialized based on the Gaussian process as

c(k)t,0 (j) = N (mt(j), st(j)) , j = 1, . . . , Nx (29)

where N(mt(j), st(j)) represents one realization of theGaussian process with mean mt(j) and standard deviationst(j). We let mt(j) = c

(1)t,0 (j), which is the center for the

first particle, and st(j) be the standard deviation of the inputsamples as

st(j)=

√√√√1

p

t∑i=t−p+1

|xi(j)−mt(j)|2, j=1, . . . , Nx. (30)

We highlight that the center c(k)t,0 (j) is different for every

particle because each is one random realization of the Gaussianprocess N(mt(j), st(j)), although mt(j) and st(j) are thesame for all particles (except the first particle). By doing so,we not only have many initial particles with centers around theinput data for fast convergence but also some particles far fromthe data to achieve good coverage.

Once the center parameters for all particles are initialized asin (28) and (29), the initial standard deviation parameters for theparticles are determined by how far the corresponding centersare away from the nearest center of other nodes. To be specific,

if the center coefficient for the jth input (j = 1, . . . , Nx) ofthe kth particle (k = 1, . . . ,K) for the new node is c(k)t,0 (j), thecorresponding standard deviation is initialized as

σ(k)t,0 (j) = ρ ·

∣∣∣c(k)t,0 (j)− c(k)nearest(j)

∣∣∣ (31)

where ρ is a constant scaling factor and c(k)nearest(j) is the center

coefficient for the jth input of the nonreplaced node with thenearest distance to c

(k)t,0 (j).

At the beginning, the initial best structure of the new node isset as the first particle as

gbestt,0 =[c(1)t,0 (1), . . . , c

(1)t,0 (Nx), σ

(1)t,0 (1), . . . , σ

(1)t,0 (Nx)

].

(32)

IV. MRLS–QPSO ALGORITHM

The proposed approach is named as the MRLS–QPSO al-gorithm in this paper as it integrates the MRLS weight adap-tation and QPSO node optimization into one process. TheMRLS–QPSO algorithm is fundamentally different from exist-ing online RBF approaches with growing/pruning model size(e.g., [14]–[21]). In existing approaches, the centers of thenewly added node are simply set as the current input data, andthe standard deviations are determined from a priori informa-tion. As a result, the constructed network structure only fits intothe data rather than the underlying model, which often makesthe model size grow with the data. In comparison, the proposedMRLS–QPSO algorithm can adaptively optimize both weightcoefficients and node structure online so that it can track thesystem variation with a sparse model.

Fig. 3 shows the flowchart of the proposed MRLS–QPSOalgorithm.

The MRLS–QPSO algorithm is summarized in the following.

InitializationInitialize the structure of the RBF nodesInitialize the weight vector w0 = [0, . . . , 0]T

Initialize P0 = δI for the MRLS, where δ is a large number.Set the forgetting factor λ for the MRLS.Set the error threshold Δ1 for the node replacement in (12).Set the error threshold Δ2 for the iterative structure andweight optimization in (21).Set the QPSO controlling parameters β in (26).

For every observation pair {xt, yt}, t = 1, 2, 3, . . .Form the input matrix Xt and information matrix Φt as in(5) and (6), respectively.Obtain the error vector et and the average error power e2t in(7) and (11), respectively.If e2t ≥ Δ1, do Structure adaptationCalculate the WNV for each node as in (18) and order themas WNV1′ ≤ · · · ≤ WNVM ′

Replace the node 1′ with a new node.Initialization for the iterative weights and structureoptimizationInitialize particle centers and standard deviations as in (28),(29), (31), respectively.

Page 7: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN et al.: ONLINE MODELING WITH TUNABLE RBF NETWORK 7

Fig. 3. Flowchart of the proposed MRLS–QPSO algorithm.

Choose the initial best structure for the new node as the firstparticle as

gbestt,0 =[c(1)t,0 (1), . . . , c

(1)t,0 (Nx), σ

(1)t,0 (1), . . . , σ

(1)t,0 (Nx)

].

Initialize the weight of the new node to 0.Maintain the weights of other nodes unchanged.Initialize Pt,l=0 = δI for the MRLS, where δ is a largenumber.For l = 1, . . . , L, do iterative weights and structureoptimization,Weight vector adaptation with the MRLSWith the structures of the new node gbestt,l−1 and the sameinput data Xt, doObtain the new information matrix Φt,l as in (6)Obtain the new error vector et,l as in (7).Update the weights with the MRLS as

Ψt,l =Pt,l−1ΦTt,l

[λIp +Φt,lPt,l−1Φ

Tt,l

]−1

Pt,l =(Pt,l−1 −Ψt,lΦt,lPt,l−1)λ−1

wt,l =wt,l−1 +Ψt,let,l.

Structure optimization for the new node with the QPSOFix the weight vector at wt,l

For every particle, k = 1, . . . ,K, doRandomly generate ϕ and μ within [0,1].Randomly generate ς to be +1 or −1.Update pbest(k)t,l and gbestt,l as in (23) and (24), respectively.Update the local attractor as

α(k)t,l = ϕ · pbest(k)t,l + (1− ϕ) · gbestt,l.

Update the gravity center of all particles as

mbestt,l =1

K

K∑k=1

pbest(k)t,l .

Update the particles as

γ(k)t,l+1 = α

(k)t,l + ς · β ·

∣∣∣mbestt,l − γ(k)t,l

∣∣∣ · ln 1

μ.

If Jt/‖yt‖2 ≤ Δ2, iterative optimization stops.Else go to the next iteration.Otherwise, if e2t < Δ1, do weight adaptation onlyAdapt the weight with the MRLS as in (7), (8)–(10)End

Initially, the centers of all nodes can be randomly placedaround the data or simply the data themselves, and the variancescan be set as a common value. While such initial structuresetting is not optimum, when the online adaptation goes on,each of the nodes is to be replaced by a new one with a betterstructure once at a time. It is expected that the node replacementoccurs frequently at the initial stage. At the beginning, the nodestructure and weight vector adaptation start simultaneously.This is verified by extensive simulations including all of thosein Section V.

Another issue is the model size selection which dependson specific applications. While there are many model sizeselection algorithms (e.g., [46] and [47]), most are for offlineapproaches or stationary systems so that they are not appli-cable in online applications. The model size selection for theproposed algorithm will be left as an interesting open topic forfuture research. In many cases, the model size of the proposedapproach can be easily determined empirically, e.g., by trialand error of the size until there is no significant gain in modelperformance. Extensive simulation results are given later in thispaper to compare the performance of the proposed algorithmwith different model sizes.

Finally, we point out that, although the proposed approachis for the RBF network with Gaussian kernels, it can beeasily adapted to many other associative networks with linear-in-the-parameter structure, e.g., thin-plate-spline and B-splinenetworks.

V. SIMULATION AND DISCUSSIONS

In this section, computer simulations are given to comparethe proposed MRLS–QPSO algorithm with some typical onlinemodeling approaches including the linear MRLS, RAN, GAP-RBF, and ELM algorithms. Except for the linear MRLS, allapproaches apply Gaussian nodes.

Page 8: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS

In the proposed MRLS–QPSO approach, the error thresholdfor the node replacement in (12) is set as Δ1 = 10−3, ten swarmparticles are used in the QPSO, and the iterative weight andstructure adaption process stops if Δ2 < 10−6 or the iterationnumber reaches 5. In all other approaches, the controllingparameters are carefully chosen based on trial and error toachieve best performance. We note that, while both offlineand online ELM approaches are available [33]–[39], the offlineELM with a fixed model size based on the whole batch of datais used in this section for comparison because it is easier to setup the simulation for the best performance. This is reasonablebecause the offline and online ELM approaches have similarperformance [38]. Thus, the comparison is logically general forall versions of the ELM.

These algorithms are compared in the application of onlinetime series prediction. To be specific, in this section, the T -stepahead prediction is to use the past four samples

xt = [yt, yt−6, yt−12, yt−18]T (33)

to estimate the future sample yt+T .The prediction performance is measured by the root mean

square error (rmse) and mean absolute error (MAE). At time t,the rmse and MAE are defined as

rmse(t) =

√√√√1

t

t∑i=1

(yi − f(xi))2 (34)

MAE(t) =1

t

t∑i=1

‖yi − f(xi)‖ (35)

respectively.Two benchmark chaotic time series are considered in this

section: Mackey–Glass and Lorenz time series. In order to fullyverify the proposed approach, both stationary and nonstationarycases are considered. In the stationary case, the parameterscontrolling the chaotic time series behavior are fixed. Whilein the nonstationary case, these controlling parameters arechanging either abruptly (piecewise function) or continuously.

A. Mackey–Glass Time Series

The Mackey–Glass time series is generated from the differ-ential delay equation as

dx(t)

dt=

ax(t− c)

1 + x10(t− c)− bx(t) (36)

where a, b, and c are controlling parameters, and particularlywhen c ≥ 17, the equation shows typical chaotic behavior. Inthis simulation, 5000 samples are generated by the fourth-orderRunge–Kutta method with a step size of 1, and the last 3000samples are used for the prediction. The forward prediction stepis set as T = 60.

1) Mackey–Glass Time Series With Fixed Parameters: First,we let a = 0.2, b = 0.1, and c = 30. Table I and Fig. 4 comparethe final prediction performance and rmse learning curves fordifferent approaches, respectively. It is clear that the linearMRLS has the worst performance. The RAN and GAP-RBF

TABLE IMACKEY–GLASS (FIXED PARAMETERS):

FINAL PREDICTION PERFORMANCE

Fig. 4. Mackey–Glass (fixed parameters): rmse learning curves.

have comparable performance, but the GAP-RBF has far fewernumber of nodes than the RAN. This is because the GAP-RBFcan prune the model size and the RAN cannot. Both the ELMand proposed approaches have significantly better performancethan the others, and the proposed approach with five nodes hasa similar performance to the ELM approach with 500 nodes.

Fig. 5(a) shows the final prediction performance of theproposed algorithm with different numbers of RBF nodes. Itis clear that the prediction performance improves with a largernumber of nodes M , but the improvement becomes insignifi-cant when M ≥ 15. We recall that, with M = 5, the proposedapproach has a similar performance to the ELM approach withM = 500. Fig. 5(b) shows that the proposed algorithm withM = 15 can very well predict the time series online.

2) Mackey–Glass Time Series With Piecewise Function: Inthis simulation, the aforementioned Mackey–Glass time seriesare weighted by the piecewise function as

y1(t) = y(t) ·Ψ(t) (37)

where

Ψ(t) =

⎧⎨⎩

0.0005 · t 10.98 0 ≤ t ≤ 999

0.0006 · t 10.95 1, 000 ≤ t ≤ 1, 999

0.0003 · t 10.97 2, 000 ≤ t ≤ 2, 999

(38)

y1(t) is used for the time series prediction. As shown inFig. 7(b), y1(t) is clearly nonstationary as it has differentdynamic ranges at different time intervals.

Page 9: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN et al.: ONLINE MODELING WITH TUNABLE RBF NETWORK 9

Fig. 5. Mackey–Glass (fixed parameters): prediction performance of theproposed MRLS–QPSO algorithm. (a) Prediction performance versus numberof nodes (M). (b) Online prediction with M = 15.

TABLE IIMACKEY–GLASS (PIECEWISE FUNCTION):

FINAL PREDICTION PERFORMANCE

Table II and Fig. 6 compare the final prediction performanceand rmse learning curves in this nonstationary case for dif-ferent approaches, respectively. Comparing these results withthose in the previous simulation for the stationary case, wecan observe that, while all approaches have worse predictionperformance than those in the stationary case, the comparisonbetween different approaches is similar, and particularly, theproposed MRLS–QPSO still has the best performance. Exceptfor the proposed approaches, all other RBF approaches havesignificantly larger number of nodes.

Fig. 7(a) shows the final prediction performance of the pro-posed algorithm with different numbers of RBF nodes. Unlikethat in the previous stationary case, the prediction performance

Fig. 6. Mackey–Glass (piecewise function): rmse learning curves.

Fig. 7. Mackey–Glass (piecewise function): prediction performance of theproposed MRLS algorithm. (a) Prediction performance versus number of nodes(M). (b) Online prediction with M = 5.

changes little with more nodes. In fact, the rmse may even be-come slightly worse when the model size becomes larger. Thisactually matches our previous statement that, in the nonsta-tionary scenario, “local” characteristic is more important thanthe “global” one. With fewer nodes, the proposed algorithm“forgets” the previous data faster so that it focuses more on therecent data. Fig. 7(b) clearly shows that, with only five nodes,the proposed algorithm well predicts this nonstationary timeseries online.

Page 10: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS

Fig. 8. Lorenz attractor.

TABLE IIILORENZ TIME SERIES (FIXED PARAMETERS):FINAL PREDICTION PERFORMANCE (T = 20)

B. Lorenz Time Series

The Lorenz chaotic time series is also often used as a bench-mark in many applications [48]. As a 3-D and highly nonlinearsystem, the Lorenz system is governed by three differentialequations as

dx(t)

dt= ay(t)− ax(t)

dy(t)

dt= cx(t)− x(t)z(t)− y(t)

dz(t)

dt=x(t)y(t)− bz(t)

where a, b, and c are parameters that control the behaviorof the Lorenz system. In the simulations, the fourth-orderRunge–Kutta approach with a step size of 0.01 is used to gen-erate the Lorenz samples, and only the Y -dimension samplesy(t) are used for the time series prediction. For the 5000 datasamples generated for y(t), the last 3000 stable samples areused for the prediction since the Lorenz system is very sensitiveto the initial condition.

1) Lorenz Time Series With Fixed Parameters: We first con-sider the Lorenz time series with fixed parameters as a = 10,b = 8/3, and c = 28. The trajectory of this Lorenz system isshown in Fig. 8.

Tables III–V compare the final prediction performances fordifferent approaches for the prediction step as T = 20, 40,and 60, respectively. In all of the tables, the linear MRLS hasthe worst prediction performance because a linear approachcannot model the nonlinear time series. The RAN and GAP-RBF achieve comparable prediction performance, but the GAP-

TABLE IVLORENZ TIME SERIES (FIXED PARAMETERS):FINAL PREDICTION PERFORMANCE (T = 40)

TABLE VLORENZ TIME SERIES (FIXED PARAMETERS):FINAL PREDICTION PERFORMANCE (T = 60)

Fig. 9. Lorenz time series (fixed parameters): the MRLS–QPSO withdifferent innovation length p.

RBF has more compact model than the RAN. The model sizefor the GAP-RBF is still very large compared to the proposedapproach: several tens versus only several.

The performances of the ELM and proposed MRLS–QPSOalgorithms with different model sizes are also shown in thetables. It is shown that when, the ELM and GAP-RBF havecomparable model sizes, their performances are comparable aswell. However, the ELM can achieve a better performance thanthe GAP-RBF with more nodes.

It is also clear that, while the performance of the ELMdepends greatly on the model size, the proposed approach isless sensitive to the model size. The proposed algorithm withfive nodes has better performance than the ELM with 500 (ormore) nodes. This clearly states the importance of the nodestructure optimization.

Fig. 9 compares the MAE learning curves of the proposedapproach with different innovation length p applied in the

Page 11: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN et al.: ONLINE MODELING WITH TUNABLE RBF NETWORK 11

Fig. 10. Lorenz time series (time-varying parameters): rmse and modelsize learning curves. (a) RMSE learning curves. (b) Model size learning curves.

MRLS weight adaptation, where the Gaussian noise with zeromean and 0.02 variance is added to the Lorenz time series tohighlight the noise rejection effect from p. It is clearly shownthat the larger p is, the more robust is the MRLS–QPSO againstthe noise. However, a higher p leads to higher complexity.

2) Lorenz Time Series With Time-Varying Parameters: Inthis simulation, we let the Lorenz controlling parameters varywith time to obtain a nonstationary system. Specifically, we seta = 10 and

b =4 + 3 (1 + sin(0.1t))

3c =25 + 3

(1 + cos(20.001t)

). (39)

The prediction step is fixed at T = 40, and the number of nodesfor the proposed approach and the ELM is fixed at 5 and 1000,respectively. We note that the performance comparison of theproposed and ELM algorithms with different model sizes issimilar to that in the aforementioned simulation so that theresults are not shown.

Fig. 10(a) and (b) compares the rmse and model size learningcurves for different approaches, respectively. It is clearly shownthat the linear MRLS algorithm has the worst performance.The RAN and GAP-RBF approaches have comparable predic-tion performances. While the GAP-RBF can produce a morecompact model than the RAN, both approaches have model

Fig. 11. Lorenz time series (time-varying parameters): online 40-stepahead prediction by the proposed MRLS–QPSO.

Fig. 12. Lorenz time series (time-based drift): online prediction with 40steps ahead by the MRLS–QPSO.

sizes increasing with the number of data input. The ELMapproach with 1000 nodes has better performance than theRAN and GAP-RBF, and the proposed algorithm has the bestprediction performance with the smallest model size among allapproaches.

Fig. 11 shows that the proposed MRLS–QPSO algorithm canpredict this nonstationary time series almost perfectly.

3) Lorenz Time Series With Time-Based Drift: In this sim-ulation, the parameters of the Lorenz systems are fixed as a =10, b = 8/3, and c = 28, but the samples of y(t) are weightedby an exponential time-based drift to obtain

y1(t) = 1.10.01t · y(t) (40)

and y1(t) is used for the time series prediction. The predictionstep is fixed at T = 40, and the number of nodes for theproposed approach is fixed at five.

Fig. 12 shows that the proposed MRLS–QPSO algorithm cantrack well the y1(t) online. It is clear that y1(t) is even morenonstationary than the time series in the previous simulationwith time-varying controlling parameters, where the dynamicrange of y1(t) goes from around [−20, 20] initially to about[−2000, 2000] in the end.

Page 12: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS

TABLE VILORENZ TIME SERIES (TIME-BASED DRIFT):FINAL PREDICTION PERFORMANCE (T = 40)

Fig. 13. Lorenz time series (time-based drift): rmse learning curves.

Table VI and Fig. 13 compare the final prediction perfor-mance and rmse learning curves for different approaches, re-spectively. It is clearly shown that the proposed MRLS–QPSOhas significantly better performance than the others. In fact, theMRLS–QPSO is effectively the only approach here that cantrack well this highly nonstationary Lorenz time series.

VI. CONCLUSION

In this paper, a novel online RBF modeling approach hasbeen proposed for nonlinear and nonstationary dynamic sys-tems. The major contribution is to combine the MRLS weightadaptation and QPSO node optimization in an innovative way.Based on an RBF model with a fixed small number of nodes,the MRLS is used to adapt the node weights at every timestep. The node optimization of replacing the worst existentnode by a new one is, however, activated when the modelingperformance becomes inadequate while using the MRLS alone.This is achieved by optimizing the center vector and covariancematrix of the new node using the QPSO online. A data-driveninitialization method has been proposed in order to achieve fastconvergence in the QPSO. Numerical simulations have demon-strated that the proposed MRLS–QPSO algorithm can achievesignificantly better performance than existing approaches witha very sparse model.

The algorithm proposed here has been specifically aimedat modeling general nonstationary nonlinear dynamic systemsusing real-time observational data. A number of simulated timeseries with various nonstationarities have been used as gooddemonstrators. Potential applications of the proposed algorithm

range from military applications (e.g., target tracking) to fun-damental communication applications (e.g., channel equaliza-tion). We have shown how a very competitive performancecan be achieved by integratively employing a flexible modelstructure, a recursive parameter estimation, and evolutionarycomputationary techniques.

REFERENCES

[1] G. E. P. Box and G. Jenkins, Time Series Analysis, Forecasting andControl. San Francisco, CA: Holden-Day, 1990.

[2] L. Ljung, System Identification—Theory for the User. Englewood Cliffs,NJ: Prentice-Hall, 1999.

[3] M. Tsatsanis and G. Giannakis, “Time-varying system identification andmodel validation using wavelets,” IEEE Trans. Signal Process., vol. 41,no. 12, pp. 3512–3523, Dec. 1993.

[4] H. Peng, T. Ozaki, Y. Toyoda, H. Shioya, K. Nakano, V. Haggan-Ozaki,and M. Mori, “RBF-ARX model-based nonlinear system modeling andpredictive control with application to a NOx decomposition process,”Control Eng. Pract., vol. 12, no. 2, pp. 191–203, Feb. 2004.

[5] M. Iatrou, T. Berger, and V. Marmarelis, “Modeling of nonlinear nonsta-tionary dynamic systems with a novel class of artificial neural networks,”IEEE Trans. Neural Netw., vol. 10, no. 2, pp. 327–339, Mar. 1999.

[6] Y. Zhong, K.-M. Jan, K. Ju, and K. Chon, “Representation of time-varyingnonlinear systems with time-varying principal dynamic modes,” IEEETrans. Biomed. Eng., vol. 54, no. 11, pp. 1983–1992, Nov. 2007.

[7] J. Park and I. W. Sandberg, “Universal approximation using radial-basis-function networks,” Neural Comput., vol. 3, no. 2, pp. 246–257,Jun. 1991.

[8] J. Moody and C. J. Darken, “Fast learning in networks of locally-tunedprocessing units,” Neural Comput., vol. 1, no. 2, pp. 281–294, Jun. 1989.

[9] S. Chen, S. A. Billings, and P. M. Grant, “Recursive hybrid algorithm fornonlinear system identification using radial basis function networks,” Int.J. Control, vol. 55, no. 5, pp. 1051–1070, May 1992.

[10] M. Birgmeier, “A fully Kalman-trained radial basis function networkfor nonlinear speech modeling,” in Proc. IEEE Int. Conf. Neural Netw.,Nov./Dec. 1995, vol. 1, pp. 259–264.

[11] F. Ding, Y. Shi, and T. Chen, “A new identification algorithm for multi-input ARX systems,” in Proc. IEEE Int. Conf. Mechatron. Autom.,Jul. 2005, vol. 2, pp. 764–769.

[12] F. Ding and T. Chen, “Performance analysis of multi-innovation gradi-ent type identification methods,” Automatica, vol. 43, no. 1, pp. 1–14,Jan. 2007.

[13] F. Ding, P. Liu, and G. Liu, “Multiinnovation least-squares identifica-tion for system modeling,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,vol. 40, no. 3, pp. 767–778, Jun. 2010.

[14] J. Platt, “A resource-allocating network for function interpolation,” NeuralComput., vol. 3, no. 2, pp. 213–225, Jun. 1991.

[15] V. Kadirkamanathan and M. Niranjan, “A function estimation approach tosequential learning with neural networks,” Neural Comput., vol. 5, no. 6,pp. 954–975, Nov. 1993.

[16] C. Molina and M. Niranjan, “Pruning with replacement on limited re-source allocating networks by F-projections,” Neural Comput., vol. 8,no. 4, pp. 855–868, May 1996.

[17] L. Yingwei, N. Sundararajan, and P. Saratchandran, “A sequential learningscheme for function approximation using minimal radial basis functionneural networks,” Neural Comput., vol. 9, no. 2, pp. 461–478, Feb. 1997.

[18] L. Yingwei, N. Sundararajan, and P. Saratchandran, “Performance evalu-ation of a sequential minimal radial basis function (RBF) neural networklearning algorithm,” IEEE Trans. Neural Netw., vol. 9, no. 2, pp. 308–318,Mar. 1998.

[19] G.-B. Huang, P. Saratchandran, and N. Sundararajan, “An efficient se-quential learning algorithm for growing and pruning RBF (GAP-RBF)networks,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 6,pp. 2284–2292, Dec. 2004.

[20] G.-B. Huang, P. Saratchandran, and N. Sundararajan, “A generalizedgrowing and pruning RBF (GGAP-RBF) neural network for functionapproximation,” IEEE Trans. Neural Netw., vol. 16, no. 1, pp. 57–67,Jan. 2005.

[21] W. Liu, P. Pokharel, and J. Principe, “The kernel least-mean-square al-gorithm,” IEEE Trans. Signal Process., vol. 56, no. 2, pp. 543–554,Feb. 2008.

[22] J. Kivinen, A. Smola, and R. Williamson, “Online learning with kernels,”IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2165–2176, Aug. 2004.

Page 13: Online Modeling With Tunable RBF Network

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN et al.: ONLINE MODELING WITH TUNABLE RBF NETWORK 13

[23] S. Chen, E. S. Chng, and K. Alkadhimi, “Regularized orthogonal leastsquares algorithm for constructing radial basis function networks,” Int. J.Control, vol. 64, no. 5, pp. 829–837, 1996.

[24] S. Chen, “Locally regularised orthogonal least squares algorithm for theconstruction of sparse kernel regression models,” in Proc. 6th Int. Conf.Signal Process., Aug. 2002, vol. 2, pp. 1229–1232.

[25] X. Hong, P. Sharkey, and K. Warwick, “Automatic nonlinear predictivemodel-construction algorithm using forward regression and the pressstatistic,” Proc. Inst. Elect. Eng. —Control Theory Appl., vol. 150, no. 3,pp. 245–254, May 2003.

[26] S. Chen, X. Hong, C. Harris, and P. Sharkey, “Sparse modeling usingorthogonal forward regression with press statistic and regularization,”IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 898–911,Apr. 2004.

[27] S. Chen, X. Hong, and C. Harris, “Sparse kernel density constructionusing orthogonal forward regression with leave-one-out test score andlocal regularization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34,no. 4, pp. 1708–1717, Aug. 2004.

[28] S. Chen, X. Hong, B. L. Luk, and C. J. Harris, “A tunable radialbasis function model for nonlinear system identification using particleswarm optimisation,” in Proc. 48th IEEE CDC/28th CCC, Dec. 2009,pp. 6762–6767.

[29] S. Chen, X. Hong, and C. J. Harris, “Particle swarm optimization aidedorthogonal forward regression for unified data modeling,” IEEE Trans.Evol. Comput., vol. 14, no. 4, pp. 477–499, Aug. 2010.

[30] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proc. IEEEInt. Conf. Neural Netw., Nov./Dec. 1995, vol. 4, pp. 1942–1948.

[31] A. Salman, I. Ahmad, and S. Al-Madani, “Particle swarm optimizationfor task assignment problem,” Microprocess. Microsyst., vol. 26, no. 8,pp. 363–371, Nov. 2002.

[32] Z.-L. Gaing, “Particle swarm optimization to solving the economic dis-patch considering the generator constraints,” IEEE Trans. Power Syst.,vol. 18, no. 3, pp. 1187–1195, Aug. 2003.

[33] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:A new learning scheme of feedforward neural networks,” in Proc. IEEEInt. Joint Conf. Neural Netw., Jul. 2004, vol. 2, pp. 985–990.

[34] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:Theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501,Dec. 2006.

[35] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, “Afast and accurate online sequential learning algorithm for feedforwardnetworks,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1411–1423,Nov. 2006.

[36] G.-B. Huang, L. Chen, and C.-K. Siew, “Universal approximation us-ing incremental constructive feedforward networks with random hiddennodes,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006.

[37] G.-B. Huang and L. Chen, “Enhanced random search based incremen-tal extreme learning machine,” Neurocomputing, vol. 71, no. 16–18,pp. 3460–3468, Oct. 2008.

[38] G. Feng, G.-B. Huang, Q. Lin, and R. Gay, “Error minimized extremelearning machine with growth of hidden nodes and incremental learning,”IEEE Trans. Neural Netw., vol. 20, no. 8, pp. 1352–1357, Aug. 2009.

[39] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning ma-chine for regression and multiclass classification,” IEEE Trans. Syst.,Man, Cybern. B, Cybern., vol. 42, no. 2, pp. 513–529, Apr. 2012.

[40] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng,N. C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decompositionand the Hilbert spectrum for nonlinear and non-stationary time seriesanalysis,” Proc. Roy. Soc. Lond. Ser. A, Math., Phys. Eng. Sci., vol. 454,no. 1971, pp. 903–995, Mar. 1998.

[41] J. D. Farmer and J. J. Sidorowich, “Predicting chaotic time series,” Phys.Rev. Lett., vol. 59, no. 8, pp. 845–848, Aug. 1987.

[42] J. Sun, W. Xu, and B. Feng, “A global search strategy of quantum-behavedparticle swarm optimization,” in Proc. IEEE Conf. Cybern. Intell. Syst.,Dec. 2004, vol. 1, pp. 111–116.

[43] S. Chen, K. Labib, and L. Hanzo, “Clustering-based symmetric radialbasis function beamforming,” IEEE Signal Process. Lett., vol. 14, no. 9,pp. 589–592, Sep. 2007.

[44] X. Hong and S. Billings, “Givens rotation based fast backward elim-ination algorithm for RBF neural network pruning,” Proc. Inst. Elect.Eng. —Control Theory Appl., vol. 144, no. 5, pp. 381–384, Sep. 1997.

[45] X. Hong, C. Harris, M. Brown, and S. Chen, “Backward eliminationmethods for associative memory network pruning,” Int. J. Hybrid Intell.Syst., vol. 1, no. 1/2, pp. 90–98, Apr. 2004.

[46] D. Yeung, W. Ng, D. Wang, E. Tsang, and X.-Z. Wang, “Localized gener-alization error model and its application to architecture selection for radialbasis function neural network,” IEEE Trans. Neural Netw., vol. 18, no. 5,pp. 1294–1305, Sep. 2007.

[47] X. Hong, R. Mitchell, S. Chen, C. J. Harris, K. Li, and G. Irwin, “Modelselection approaches for nonlinear system identification: A review,” Int.J. Syst. Sci., vol. 39, no. 10, pp. 925–946, Oct. 2008.

[48] E. N. Lorenz, “Deterministic nonperiodic flow,” J. Atmos. Sci., vol. 20,no. 2, pp. 130–141, Mar. 1963.

Hao Chen received the B.Eng. degree in automaticcontrol from the National University of DefenseTechnology, Changsha, China, in 2006 and the M.Sc.degree in control systems (with distinction) fromThe University of Sheffield, Sheffield, U.K. in 2009.He is currently working toward the Ph.D. degree atthe School of Systems Engineering, University ofReading, Reading, U.K.

His research interests are in modeling and iden-tification of nonlinear and nonstationary systems,artificial neural networks, and machine learning.

Yu Gong received the B.Eng. and M.Eng. degrees inelectronic engineering from the University of Elec-tronics Science and Technology of China, Chengdu,China, in 1992 and 1995, respectively, and the Ph.D.in communications from the National University ofSingapore, Singapore, in 2002.

After obtaining the Ph.D. degree, he took severalresearch positions with the Institute of InforcommResearch, Singapore, and Queen’s UniversityBelfast, Belfast, U.K. From 2006 to 2012, he hasbeen an academic member with the School of

Systems Engineering, University of Reading, Reading, U.K. He joins theSchool of Electronic, Electrical and Systems Engineering, LoughboroughUniversity, Loughborough, U.K., in 2012. His research interests are in the areaof signal processing and communications including wireless communications,cooperative networks, nonlinear and nonstationary system identification, andadaptive filters.

Xia Hong received the B.Sc. and M.Sc. degrees inautomatic control from the National University ofDefense Technology, Changsha, China, in 1984 and1987, respectively, and the Ph.D. degree in automaticcontrol from The University of Sheffield, Sheffield,U.K., in 1998.

She was a Research Assistant with Beijing Insti-tute of Systems Engineering, Beijing, China, from1987 to 1993. She was a Research Fellow with theDepartment of Electronics and Computer Science,University of Southampton, Southampton, U.K.,

from 1997 to 2001. She is currently a Reader with the School of SystemsEngineering, University of Reading, Reading, U.K. She is actively engaged inresearch on nonlinear system identification, data modeling, estimation and in-telligent control, neural networks, pattern recognition, learning theory, and theirapplications. She has published over 100 research papers and has coauthored aresearch book.

Dr. Hong was a recipient of a Donald Julius Groen Prize by IMechE in 1999.


Recommended