A Neural Network-Based Joint Prognostic Model for Data ......A Neural Network-Based Joint Prognostic...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

A Neural Network-Based Joint Prognostic Modelfor Data Fusion and Remaining Useful

Life PredictionYuanyuan Gao, Yuxin Wen , and Jianguo Wu , Member, IEEE

Abstract— With the rapid development of sensor and infor-mation technology, now multisensor data relating to the systemdegradation process are readily available for condition monitor-ing and remaining useful life (RUL) prediction. The traditionaldata fusion and RUL prediction methods are either not flexibleenough to capture the highly nonlinear relationship between thehealth condition and the multisensor data or have not fully uti-lized the past observations to capture the degradation trajectory.In this article, we propose a joint prognostic model (JPM), whereBayesian linear models are developed for multisensor data, andan artificial neural network is proposed to model the nonlinearrelationship between the residual life, the model parameters ofeach sensor data, and the observation epoch. A Bayesian updatingscheme is developed to calculate the posterior distributions of themodel parameters of each sensor data, which are further usedto estimate the posterior predictive distributions of the residuallife. The effectiveness and advantages of the proposed JPM aredemonstrated using the commercial modular aero-propulsionsystem simulation data set.

Index Terms— Bayesian inference, degradation modeling, jointprognostic model (JPM), neural network (NN), remaining usefullife (RUL) prediction.

I. INTRODUCTION

MAINTAINING high reliability in modern engineeringsystems, e.g., airlines and automobiles, is essential toachieve a desirable efficiency and productivity. As a result,there is a growing need for prognostics to eliminate unsched-uled breakdowns and reduce maintenance costs. Prognosticsrefer to the estimation of the remaining useful life (RUL)of degrading systems and components based on the currenthealth condition [1]. In general, there are two types of prog-nostic models: physical-based and data-driven models [2]. Forphysical models, it is necessary to fully understand the specificdegradation mechanisms, which may not be feasible in practicedue to limited knowledge or high system complexity [3]. In

Manuscript received December 22, 2018; revised August 4, 2019 andNovember 6, 2019; accepted February 23, 2020. This work was supportedby the Natural Science Foundation of China under Grant 51875003 andGrant 71932006. (Corresponding author: Jianguo Wu.)

Yuanyuan Gao and Jianguo Wu are with the Department of IndustrialEngineering and Management, College of Engineering, Peking University,Beijing 100871, China (e-mail: [email protected]; [email protected]).

Yuxin Wen is with the Department of Electrical and Computer Engineering,The University of Texas at El Paso, El Paso, TX 79968 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this article are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TNNLS.2020.2977132

contrast, the data-driven models make use of the conditionmonitoring (CM) data that are closely related to degradationprocesses for RUL prediction. With the rapid developmentof information and sensing technology, now the data-drivenmodels have been widely used in modern engineering systemsfor CM and predictive maintenance. Typical CM signalsused in data-driven prognostics include temperature, vibra-tion, and fuel consumption data collected during the systemoperation [1].

There are a large number of data-driven models in theexisting literature. They can be grouped into two cate-gories: artificial intelligence (AI) techniques and statisticalapproaches [2], [4]–[9]. AI techniques include the artificialneural networks (ANNs), support vector machines (SVMs),fuzzy logic systems, fuzzy-NNs, and evolutionary algorithms.Statistical approaches include various stochastic processes,state-space models, and regression-based models. Most of theexisting models only focus on a single measure for RULprediction. These approaches are effective when most of theunderlying degradation characteristics can be well capturedby a single sensor. In practice, however, a single signalmay not be adequate to characterize the complex degrada-tion process, especially in complicated systems. This mayresult in significant overestimation or underestimation of theremaining lifetime. In such cases, multiple sensors are neededto capture various aspects of the degradation process forprognostic improvement. Each sensor may contain only partialinformation about the same system. Some are highly related tothe degradation mechanism, while others may not. Therefore,effectively fusing these sensor data is highly desirable andvery promising to provide more accurate and robust prognosticresults.

Data fusion-based prognostics have been intensively studiedrecently. They can be generally classified into two categories:1) construction of a health index (HI) for condition assessmentand RUL prediction and 2) establishing a functional mappingbetween the multisensor data and RUL. The methods ofthe first category are often unsupervised or semi-supervised.Liu et al. [1] developed a single composite HI by linearlycombining different sensors to better characterize the degra-dation process. The linear combination is performed in such away that it satisfies two essential properties, i.e., maximizingthe monotonic property of the HI and minimizing the varianceof the failure threshold. Later Liu and Huang [10] developedanother HI via linear fusion of multiple degradation-based

2162-237X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Peking University. Downloaded on August 17,2020 at 02:33:33 UTC from IEEE Xplore. Restrictions apply.

https://orcid.org/0000-0002-2352-5622https://orcid.org/0000-0002-2885-8725


2 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 1. Problem of using (a) one or (b) two successive observations as inputfor ANN models.

signals and trained the model by minimizing both the modelfitting errors and the variance of the failure threshold.However, these methods combine multisensor data linearly.In practice, due to system complexity, the sensor signalsmay have a highly nonlinear relationship with the degradationprocess, which will inevitably limit the effectiveness of thelinear fusion methods. To overcome this issue, Song et al. [11]integrated the data fusion model with the kernel method toprovide a nonlinear relationship between the HI and sensorsignals. Nevertheless, the nonlinear relationship is limitedgreatly by the type of kernel functions. Therefore, more flexi-ble nonlinear modeling is needed for prognostic improvement.

The prognostics of the second category are mainlysupervised methods, where multisensor variables are usedas or transformed into multidimensional features, and the cor-responding RULs are used as labels. In this category, the ANNmodels are mostly used due to their adaptability, nonlinear-ity, and arbitrary function approximation ability [12]–[14].Various ANN models with different network structures, inputand output parameters, and activation functions have beendeveloped for RUL prediction. Gebraeel et al. [15] proposeda feedforward NN-based approach for the RUL prediction ofball bearings. In their model, the vibration amplitudes of thefirst seven defective frequencies at the time of prediction wereused as input and the residual life was used as the modeloutput. However, due to unit-to-unit heterogeneity, using onlythe observation at the current time may not be effective forRUL prediction.

As shown in Fig. 1(a), unit 1 and unit 2 have the samedegradation value at the time t1, yet their degradation pathsdiffer significantly, indicating a totally different residual life.If the observation at t1 is used as input, the predicted residuallife would be the same for both units, thus resulting in largeerror. Later, several improved ANN models were developed,where the raw observations or smoothed ones at both theprevious and the current observation epochs, and the currentepoch t were used as model inputs, and the percentage oflife occurred, or alternatively the remaining life percentagewas used as model output [12], [16]–[18]. Using this strategycould better capture the health condition and degradation path.Nevertheless, the unit heterogeneity issue may still affect theprediction accuracy, which can be seen from Fig. 1(b), wheretwo units with the same degradation values at t1 and t2 havesignificantly different degradation paths.

To more effectively capture the observed degradation pathfor RUL prediction, recurrent NN (RNN) and deep convo-lutional NN (CNN)-based methods are developed recently.Heimes [19] utilized an advanced RNN architecture to capture

the long-term dependencies for RUL prediction. Wu et al. [20]proposed to use vanilla long short-term memory (LSTM)networks to make the most of LSTM ability for prognosticimprovement. Babu et al. [21] developed a deep CNN-basedapproach for RUL prediction. In this method, the convolutionand pooling filters are applied along the temporal dimensionover the multisensor data. A similar method was also proposedin [22]. By utilizing a moving window of multiple successiveobservations as input, these methods can effectively mitigatethe problem shown in Fig. 1. However, their performanceis still not satisfactory in the early stage of the life cycles,where degradation is often not noticeable. The reason is thatthere is almost no change in the input features at the earlystage, while the output RUL decreases linearly. In fact, all ofthese methods mentioned earlier used artificially labeled RULinstead of true RUL as the output. Within the early degradationstage, the RUL at a certain time epoch, e.g., the starting epochof rapid degradation stage, was assigned to all previous timeepochs, which is equivalent to starting to predict the RUL onlywhen the obvious degradation appears. Therefore, this trick isactually not capable of predicting the RUL at the early stage.

To address the above-mentioned issues, we propose a jointprognostic model (JPM) in this article, where a Bayesian linearmodel is used to model the degradation signals, and an ANNis proposed to establish the relationship between the residuallife and the model parameters of each sensor data. Specifically,a commonly used two-stage degradation modeling and CMscheme is proposed for each sensor signal [7], [23], [24].At the offline stage, a Bayesian linear model is developedfor each sensor based on historical data to capture both thepopulation behavior and individual heterogeneity. At the onlinemonitoring stage, a recursive Bayesian updating approach isapplied to infer the model parameters of each sensor datasequentially for an in-service unit. To model the nonlinearrelationship between the residual life and multisensor degra-dation data, a three-layer NN is developed, where the modelparameters of multisensor data along with the time step t areused as the input layer, and the occurred life percentage is usedas model output. Since the online updated model parametersof the in-service unit are obtained by utilizing all the historicaldata of other units and the available data of that unit, they canmore effectively capture the degradation path or the long-termdependence.

The remainder of this article is organized as follows.Section II introduces the technical details of the JPM frame-work, including the offline degradation modeling and esti-mation, the NN model formulation and training, and onlineBayesian updating. Section III demonstrates and evaluates theeffectiveness of the proposed JPM through a case study usingNational Aeronautics and Space Administration (NASA),Washington, DC, USA, commercial modular aero-propulsionsystem simulation (C-MAPSS) data set for commercial aircraftgas turbine engines [25]. The conclusion and discussion areprovided in Section IV.

II. JOINT PROGNOSTIC MODEL FOR RUL PREDICTION

The overall framework of the JPM is shown in Fig. 2.There are two stages, offline stage and online stage. At the



GAO et al.: NN-BASED JPM FOR DATA FUSION AND RUL PREDICTION 3

Fig. 2. Illustration of the joint prognostic framework.

offline stage, both the mixed-effects model for degradationmodeling and the NN model for data fusion and RULprediction are established. At this stage, a commonly usedempirical two-step approach [8] is applied to estimate thedistribution parameters of the mixed-effects model. In thisapproach, the model parameters of each degradation signalare obtained through the maximum likelihood estimation(MLE) for each unit, and then all the fit parameters areused to estimate the distribution parameters through the MLEmethod. In the NN model formulation and training, thoseestimated parameters of each degradation signal based on allthe observations of the whole lifetime and observation epocht are used as model input. At the online stage, a Bayesianmodel updating scheme is used to calculate the posteriordistributions of degradation model parameters of an in-serviceunit, and then the posterior means are used as input of the NNmodel to estimate RUL. The technical details are provided inSections II-A–II-C.

A. Bayesian Linear Modeling and Parameter Estimation

Mixed-effects models are often used to capture both thepopulation behavior and unit-to-unit variability in degradationmodeling. It can be generally formulated as [26]

st = η(α,φ, t) + εt (1)where st is the measured degradation signal at time step t ,η is a parametric form of the degradation model, α is avector of the fixed-effects parameters, φ is a vector of therandom-effects parameters, and εt is noise or error termfollowing i.i.d. normal distributions. Typically, a linear modelis assumed in the parametric form for its simplicity andflexibility. To account for the noise variance heterogeneityacross different units, we propose to use a more general model,the Bayesian linear model, where the noise variance is alsoassumed random. Suppose there are I historical units, and eachunit has J degradation signals. The following Bayesian linear

model is proposed to characterize the multiple degradationsignals

si, j,k = X i, j,kβ i, j + εi, j,k (2)where si, j,k is the sensor measurement for unit i and sensor jat inspection time ti,k , X i, j,k is a (q j +1)-dimensional vector ofpolynomial basis functions, i.e., X i, j,k = [1, ti,k, t2i,k, . . . , tq ji,k ],β i, j is a (q j + 1)-dimensional vector of regression parametersfollowing a multivariate normal distribution, εi, j,k is the i.i.d.measurement noise with εi, j,k ∼ N(0, σ 2i, j ), and σ 2i, j followsan inverse Gamma distribution. Note that here σ 2i, j is alsoassumed random to make it more flexible to capture theunit heterogeneity, which is different from the common wayof using a deterministic noise level [1], [10]. To facilitateBayesian updating at the online stage, a joint conjugate prioris assigned for β i, j and σ

2i, j , i.e., β i, j |σ 2i, j ∼ N(μ( j)0 , σ 2i, j�( j)0 )

and σ 2i, j ∼I G(α( j)1 , α( j)2 ). Note that here we assume that allthe multiple degradation signals of each unit are measured atthe same time steps for notational convenience, i.e., ti,k, k =1, . . . , ni where ni is the total number of inspection time stepsfor unit i .

The model parameters of interest are ψ ( j) ={μ( j)0 ,�( j)0 , α( j)1 , α( j)2 }, j = 1, . . . , J , which arehyperparameters that will be used at the online stagefor Bayesian updating. The MLE approach is a naturalway to estimate ψ ( j). Specifically, ψ ( j) can be obtained bymaximizing the following marginal likelihood:

ψ̂( j) = arg max

ψ( j )

I∏i=1

∫∫p(si, j |β i, j , σ 2i, j

)π

(β i, j , σ

2i, j |ψ( j)

)dσ 2i, j dβi, j (3)

where si, j =(si, j,1, si, j,2, . . . , si, j,ni

)Tand π(βi, j , σ

2i, j ) is the

joint probability density function of β i, j and σ2i, j . However,

the above-mentioned marginal likelihood is very complex andnot easy to optimize directly. A commonly used method to




handle this issue is the expectation-maximization (EM) algo-rithm [27], where β i, j and σ

2i, j can be treated as missing vari-

ables. However, the Q function is intractable in the expectationstep, and thus, the Monte Carlo-based EM algorithm may berequired, which makes the problem very complex. To addressthis issue, we propose to use a two-step estimation approach,a much easier and efficient method where the degradationsignals of each unit are fitted through the MLE method, andthen the estimated parameters {βi, j , σ 2i, j } are used to estimatehyperparametersψ( j). Although some bias may be introduced,this bias is often negligible in the Bayesian online updatingprocess [8].

Suppose X i, j is the design matrix for unit i and sensor j ,which is given as

X i, j =

⎡⎢⎢⎣1 ti,1 · · · tq ji,11 ti,2 · · · tq ji,2· · · · · · · · · · · ·1 ti,ni · · · tq ji,ni

⎤⎥⎥⎦ (4)then the MLE of {β i, j , σ

2i, j } can be easily obtained as

β̂ i, j =(XTi, j X i, j

)−1XTi, j si, j , σ̂

2i, j =

∥∥si, j − X i, j β̂ i, j∥∥2ni

. (5)

The hyperparameters {α( j)1 , α( j)2 } for the inverse-Gamma dis-tributed noise variance can be estimated numerically bymaximizing the likelihood function using estimated values{̂σ 2i, j , i = 1, . . . , I }. For {μ( j)0 ,�( j)0 }, the MLE can be ana-lytically obtained as

μ̂( j)0 =

∑Ii=1

β̂ i, j

σ̂ 2i, j∑Ii=1

1σ̂ 2i, j

�̂( j)0 =

1

I

I∑i=1

(β̂ i, j − μ̂( j)0

)(β̂ i, j − μ̂( j)0

)Tσ 2i j

. (6)

The derivation of (6) can be found in [8].

B. Neural Network for Data Fusion and RUL Prediction

In single-degradation-signal based prognostics, the RUL isoften defined as the first passage time that the true degra-dation signal (measurement noise removed) hits a predefinedfailure threshold. When there are multiple degradation signals,however, it may not be practical to set a failure threshold foreach degradation signal. Instead, a HI representing the trueunderlying degradation process can be constructed throughdata fusion of all degradation signals, and the RUL can bepredicted by estimating the first passage time of the HI.

Suppose {β1, . . . ,β J } are the degradation parameters of Jsensor signals of a unit. It is assumed that there exists afunction h such that HI(t) = h(β1, . . . ,β J , t). In practicalapplications, h may be highly nonlinear. Given {β1, . . . ,β J },the HI is a deterministic function of time t . Therefore, theRUL is a deterministic function of {β1, . . . ,β J } and t , denotedas RUL = g(β1, . . . ,β J , t). Due to its excellent functionapproximation capability, ANNs are proposed to approximateg for data fusion and RUL prediction.

Fig. 3. Structure of the proposed NN model.

The structure of the proposed NN is shown in Fig. 3.The input layer consists of J sets of degradation parametersβ1, . . . ,β J and time t . For illustration purposes, only threesets of sensor parameters are shown in Fig. 3 and eachsensor signal is modeled by a quadratic polynomial withβ i = (a j , b j , c j). Let pt be the occurred life percentage, i.e.,pt = t/(t + RUL(t)). We use the pt as the output layer. Thetanh(·) is selected as the activation function among the inputlayer and hidden layers due to its obvious advantages overthe sigmoid function. Between the last hidden layer and theoutput layer, the sigmoid activation function is used. In themodel training, the sum-of-squared-error is used as the lossfunction

R(θ) =I∑

i=1

Ki∑k=1

(pi,k − g

(β i,1, . . . ,β i,J , tk

))2(7)

where θ is a vector of NN weights or model parameters, Kiis the selected number of time instances of unit i for training,and pi,k is the occurred life percentage at the time tk for unit i .

The NN model is fitted by using the well-known backprop-agation algorithm, a generic gradient descent-based approachin NN training. In the training process, all the inputs arestandardized to allow one to choose a meaningful rangefor the random starting weights. Since the loss functionis nonconvex and has many local minima, the trainedmodel is quite dependent on the choice of starting weights.Therefore, a large number of random starting weights aretried, and the one with the lowest loss is selected as the finalsolution.

C. Online Bayesian Updating and RUL Prediction

In the online monitoring and prediction, the degradationparameters of an in-service unit need to be estimated ina real-time manner. With the hyperparameters obtained inSection II-A, It is natural to use the online Bayesian updat-ing scheme to calculate the posterior distribution of all thedegradation parameters. Suppose the available sensor data ofa unit up to the current time step k is s j,1:k, j = 1, . . . , J ,then the posterior target distributions are p

(β j | s j,1:k

), j =

1, . . . , J . Due to the usage of conjugate priors, all the posteriordistributions can be calculated analytically as follows.




Proposition 1: Suppose the priors σ 2j ∼ I G(α( j)1 , α( j)2 ),β j |σ 2j ∼N(μ( j)0 , σ 2j �( j)0 ), then the posterior distributions canbe derived as(

σ 2j | s j ,1:k) ∼ I G(α( j)1 + k2 , α( j)2 + H j,k2

)(8)(

β j | σ 2j ,s j ,1:k) ∼ N(μ( j)k , σ 2j �( j)k ) (9)(

β j | s j ,1:k) ∼ MT(μ( j)k , 2α( j)2 + H j,k

2α( j)1 + k�

( j)k , v j,k

)(10)

where

v j,k = 2α( j)1 + k�

( j)k =

(XTj,k X j,k +

(�

( j)0

)−1)−1T j,k =

(�

( j)0

)−1μ

( j)0 + XTj,ks j ,1:k

μ( j)k = �( j)k T j,k

H j,k =∥∥s j ,1:k∥∥2 + (μ( j)0 )T (�( j)0 )−1μ( j)0 − T Tj,k�( j)k T j,k .

X j,k is the design matrix for the j th degradation signal up totime step k, MT denotes the multivariate t distribution, andv j,k is the degree of freedom of the multivariate t distribution.The derivation can be found in [8].

From (9) and (10), we can see that given the same μ( j)0and �( j)0 , whether σ

2j is random or not and does not change

the posterior mean of β j . If σ2j is deterministic, i.e., all the

units share the same noise variance, which is often an assump-tion in some existing work and is estimated from historicaldata [10], [28]. Equation (9) can be used to update theposterior distribution of β j .On the other hand, if σ

2j is random

and unknown, (10) would be more accurate to capture theuncertainty of β j , though its posterior mean is unchangedcompared with (9). In practical applications, however, treatingσ 2j deterministic or random will influence the hyperparameter

estimation, which will result in different μ( j)k .In the RUL prediction, the posterior mean or the maximum-

a posteriori-probability (MAP) estimate of β j can be pluggedinto the NN model to get the point estimate of pt . Once pt isestimated, we can easily obtain the RUL as

RUL(t) = 1 − ptpt

t . (11)

Within the Bayesian framework, we can also convenientlyobtain the posterior predictive distribution of pt or the RUL.Specifically, the Monte Carlo approach can be used to generaterandom samples from the posterior distribution of β j , and thenthese samples are substituted into the NN model to get thepredictive samples of RUL.

III. APPLICATION ON C-MAPSS DATA SET

A. Overview of the System and Data Set

In this section, the proposed JPM is illustrated and evaluatedthrough the benchmarking C-MAPSS data set [25]. C-MAPSSis a tool developed by NASA for simulating a realistic largecommercial turbofan engine that is monitored by multiple

Fig. 4. Simplified engine diagram simulated in C-MAPSS [25].

TABLE I

DESCRIPTION OF 21 C-MAPSS OUTPUTS

sensors. The C-MAPSS data set generated in [25] has beenwidely used as a benchmark system with multiple degradationsignals in the prognostics and health management (PHM) field.Fig. 4 provides a schematic of a commercial aircraft gasturbine engine that was simulated using C-MAPSS.

C-MAPSS simulates an engine model of the 90 000-lb thrustwith altitudes ranging from sea level to 40 000 ft, Mach num-bers from 0 to 0.90, and sea-level temperatures from −60 ◦Fto 103 ◦F. Users can adjust the conditions of aircraft altitude,Mach number, and throttle resolver angle to simulate differentenvironmental conditions [25]. There are 14 inputs to simulatevarious degradation scenarios. The outputs include varioussensor response surfaces and operability margins. A total of 21variables out of 58 different outputs available from the modelare used for analysis, as shown in Table I.

To consider unit-to-unit variability, an unknown variancefor initial wear level and a random noise were introduced.




A failure threshold for a hidden HI that is not accessibleto users is predefined, beyond which the unit is consideredfailed. A total of four data sets with the corresponding failuremodes and operational conditions were generated. In thisarticle, we only consider two of the four data sets, FD001 andFD003, which are commonly used in performance evaluationand comparison. The FD001 data set has a single-failure mode(HPC degradation) and a single-operating condition, while theFD003 data set has one operating condition but two faultmodes (HPC and fan degradation). For FD002 and FD004,there are six operating conditions mixed together, all of whichaffect the sensor values and the degradation process. It isinappropriate to estimate RUL based only on the 21 sensordata. Therefore, these two data sets are not considered here.

For each data set considered in this article, there are 100training units and 100 testing units. In the training data set,the fault grows in magnitude until system failure. In the testdata set, the time series ends sometime prior to system failure.A file of the actual remaining lifetime of the 100 testing unitsis also included for each data set. Sensor readings from the21 outputs are collected at each observation epoch for eachunit. The prognostic model is developed based on the availabledegradation patterns of the 100 training units and the testingdata set is used for performance evaluation.

B. Variable Selection and Data PreprocessingAmong these 21 outputs, 14 outputs are highly related to

the degradation process with an increasing or decreasing trend,while the other outputs are almost unchanged. Therefore onlythese 14 degradation signals are included for further selection.The correlation analysis shows that there exist high correla-tions among these outputs (up to 0.96 for certain pairs). Signalswith low correlation exhibit different signal patterns andinvolve different characteristics of the same unit. Therefore,the outputs are selected in such a way that the data show anobvious degradation trend and the pair-wise correlations of theselected outputs are as low as possible. To select the outputsbased on the correlation, the hierarchical clustering algorithmis used, where (1 − correlation coefficient) is used as thedistance or dissimilarity measure. The clustering dendrogramis shown in Fig. 5, where five clusters are obtained with acorrelation threshold of 0.75. For each cluster, we randomlyselect an output, and the final outputs selected for predictionare Nc, T24, BPR, htBleed, and T30.

The typical degradation forms of the selected sensor signalsare shown in Fig. 6. All of these signals show an exponentialfunctional form, which has been widely used to model cumu-lative damage processes [24], [28], [29]. Therefore, we usethe exponential function to describe the degradation processof the turbofan engine. Following Liu et al. [1], we firstperform log-transformation to the data and then apply linearmodels to the log-transformed data. Specifically, the quadraticpolynomial function is assumed for the log-transformed dataof each selected variable.

C. Results

As discussed in Section II-B, the proposed NN modeluses fit parameters and time t as inputs and the occurred

Fig. 5. Hierarchical clustering of the 14 outputs based on correlation.

Fig. 6. Typical raw measurements and fit values using exponential functionfor the five selected signals.

life percentage as output. There are five selected signals,with each having three parameters. Therefore, there are intotal of 16 input neurons in the NN model. To reduce thecomputational cost and to balance the sample size amongdifferent units, ten equally spaced time epochs across thewhole life span are selected for each unit in the trainingprocess. The fivefold cross validation is used on the trainingdata set with 100 units for model structure selection. Themodel with minimum validation error is eventually selected.It has three layers with six neurons in the first hidden layer andthree neurons in the second hidden layer. After the structureis determined, we use the entire training data set to trainthe model parameters. The MATLAB NN training packageis used for training. In the training process, we divide theentire training data set into two parts, including the trainingpart (70%) and the validating part (30%). The training iterationterminates when validating error starts to increase. As the NNis inherently stochastic, 1 × 104 sets of initial weights andbiases are randomly generated for the iteration to start with,and the trained model with the lowest validation error is used.Once the model is trained, it is applied to the 100 testing unitsfor performance evaluation.

The estimated hyperparameters based on (5) and (6) forthe five selected signals are shown in Table II. As we cansee from the inverse Gamma parameters α1 and α2, BRP hasthe largest while T24 has the lowest mean and variance for thenoise variance σ 2 among these five signals. Fig. 7 showsthe Bayesian updating process for T30 and htBleed of a ran-domly selected engine at different time steps. Unsurprisingly,




TABLE II

ESTIMATED HYPERPARAMETERS FOR THE FIVE SELECTED SIGNALS

Fig. 7. Bayesian updating of the model parameters β of T30 (top row) and htBleed (bottom row) for a randomly selected unit at different time steps. βi isthe ith component of β.

the posterior distributions become closer and closer to the truevalue in terms of the mean and variance when more observa-tions are collected. Note that the posterior distributions for anoise variance σ 2 also have a similar trend. However, sinceσ 2 is not used as input in the NN model, we do not show ithere.

In the RUL prediction, two approaches could be used. Thefirst one is to use the posterior means as the point estimates(MAP) of β and then substitute them to the NN model to getthe point estimate of life percentage pt . The advantage of thisapproach is its simplicity and low computational cost. Anotheralternative approach is to calculate the posterior predictive

distribution of pt with the following density function:

f(

ptk | s1,1:k, s2,1:k, . . . ,sJ ,1:k)

= f (g(tk,β1, . . . ,β J ) | s1,1:k, s2,1:k, . . . ,sJ ,1:k) (12)where g(·) is the life percentage prediction function basedon the proposed NN model. The advantage of this approachis that it makes full use of the uncertainty of β j , j =1, . . . , J to produce a predictive distribution instead of apoint estimate for life percentage. Due to the high com-plexity of function g(·), the posterior predictive densityfunction is not tractable analytically. Naturally, the Monte




Fig. 8. Posterior predictive distribution of occurred life percentage for six randomly selected units from the testing dataset. The vertical dashed lines denotethe true values.

Fig. 9. Comparison of the proposed method (JPM) with NN1 (structure: 6-4-2-1) and NN2 (structure: 12-8-4-1) using α − λ performance metric on sixrandomly selected units.

Carlo approach could be used, where random samples aredrawn from the posterior distributions of β j first, and thenthese samples are plugged into the NN model to obtain lifepercentage samples that approximately follow the posteriorpredictive distribution. In this approach, we can also usethe mean of the posterior predictive distribution as a pointestimate for pt . In our application study, there is no sig-nificant difference between the point estimates using bothapproaches.

Fig. 8 shows the posterior predictive distributions of lifepercentage for six randomly selected engines from the testingdata set. The prediction is performed at the time of thelast observation. Clearly, the prediction is very accurate with

mean or MAP close to the true values, even if the predictionis performed at its early stage.

Fig. 9 shows the performance comparison of the JPM withthe other two commonly used methods, i.e., the NN modelwith time t and observation at t as inputs (denoted by NN1),and the one with time t , and two consecutive observationsas inputs (denoted by NN2) using the α − λ metric [30]on six randomly selected engines. The selected structures ofNN1 and NN2 are 6-4-2-1 (e.g., six nodes in the input layerand four nodes in the second layer) and 12-8-4-1, respectively.In this metric, α specifies the error bound (here, α = 15%) onthe estimated residual life percentage, i.e., (1 − α)(1 − pt) ≤(1 − p̂t) ≤ (1 + α)(1 − pt), and λ specifies the relative




Fig. 10. Mean of the absolute percentage errors for the proposed JPM withall the five selected degradation signals and with each individually.

distance in time between the prediction point and the actualfailure time. It can be seen that almost all the JPM estimatedresidual life percentages lie in the ±15% error bound, andthe prediction becomes more and more accurate as the engineapproaches failure. In contrast, the NN1 and NN2 methodshave much lower prediction accuracy in most cases and theprediction accuracy is not stable across different time steps.

To further evaluate the performance of JPM and compare itwith other methods, we also use the mean value of the absolutepercentage error used in [1] as the performance metric, whichis defined as

err = 1N

N∑i=1

∣∣R̂i − Ri ∣∣Ti

(13)

where R̂i is the predicted residual life at the time of the lastmeasurement for unit i in the testing data set, Ri is the trueresidual life, Ti is the total life from the beginning to the failuretime, and N is the total number of testing units. Fig. 10 showsthe mean absolute percentage error of prediction at differentlevels of an actual remaining lifetime for the proposed JPMwith all the five selected signals and with only one of them,respectively. The level label “all” refers to all the testing units,while “T-L” means the testing units with residual life lessthan or equal to L. As we can see, the proposed JPM byfusing all the five selected signals has much lower predictionerror than all the other ones with only one degradation signal.

Fig. 11 shows the comparison of JPM with NN1, NN2,HI_2013 [1], HI_kernel [11], and the LSTM method [31] interms of the mean absolute percentage error on the testing dataset of the FD001 data set. LSTM is a special kind of RNN,and it is well-suited to classifying and making predictionsbased on time series data. Since it has been recently used forprognostics, we include it here as a state-of-the-art methodto compare with. We set the size of the time sliding windowto 50, which means 50 continuous measurements of 5 selectedsignals were used as an input. The tuned LSTM networkstructure has five hidden layers with 1024, 512, 256, 128 and64 nodes respectively. The activation function of the middlelayers was set to “relu” while that of the output layer was setto sigmoid. Because of the large data size, we use minibatchand batch size was set as 1024. The dropout (the dropout rate

Fig. 11. Comparison of the proposed JPM with other approaches usingFD001 data set.

Fig. 12. Comparison of the proposed JPM with other approaches usingFD003 data set.

is set as 0.2) and fivefold cross validation was used to avoidoverfitting.

As shown in Fig. 11, a decreasing trend is observed foralmost all these methods, and the proposed JPM outperformsall the other methods significantly in most cases, especially atthe early stage. When the units approach failure, more observa-tions will be obtained to infer the degradation trajectory, thusresulting in more accurate predictions. Expectedly, NN2 hashigher prediction accuracy than NN1. Due to unsupervisedlearning, all the HI-based approaches have poor performanceat the early prediction stages. For the LSTM approach, the per-formance is not satisfactory at the early stage. As we explainedin Section I, the degradation of all engines is not noticeable inthe early life cycles. The input features do not change muchwhile the output decreases linearly. Therefore, it is hard forLSTM to achieve high performance across all the life cycles.Due to the usage of the Bayesian framework and the inclusionof time epoch t as the input, the proposed JPM can effectivelyovercome this issue and achieves excellent performance inboth the early and late-life cycles. To further validate theeffectiveness of our proposed method, we applied the JPM onanother engine data set FD003. Different from the first data setFD001 with only a single-failure mode, the FD003 data set hastwo fault modes. In the existing literature, few research worksinvestigated the HI-based approaches using the FD003 dataset, and therefore, we only compare the JPM with NN1, NN2,and LSTM here. Fig. 12 shows the comparison results, whichalso shows the effectiveness of the proposed method.




IV. CONCLUSION

The availability of multisensor data provides us a greatopportunity to better monitor the degradation process andimprove prognostic accuracy. The existing linear data fusiontechniques are incapable of capturing the possible hiddennonlinear relationship, while the traditional NN-based predic-tion methods fail to fully utilize the past observations beforethe prediction epoch. To tackle these problems, a JPM wasproposed in this article for multisensor data fusion and RULprediction. In this approach, a Bayesian linear model wasused to model each sensor data that partially captures thedegradation process, and an ANN was proposed to model thepotentially nonlinear relationship between the residual life andmultisensor degradation data. At the offline stage, an empiricaltwo-stage process was used to estimate the hyperparametersof prior distributions. At the online stage, a Bayesian updatingscheme was used to update the posterior distribution of modelparameters of an in-service unit, and the updated parameterswere used as input of the NN for residual life prediction. Thedeveloped method was demonstrated and validated using theC-MAPSS data set. The performance and comparison resultshad shown that JPM has much higher prediction accuracy thanthe existing approaches at most of the prediction epochs.

The advantages of the proposed JPM are twofold: 1) byusing the Bayesian filtered parameters and epoch t as inputof the NN model, the JPM can fully utilize the multisensordata before the prediction epoch to capture the degradationtrajectory and 2) compared with RNN, LSTM or deep CNNapproaches, the proposed JPM has a simple network structureand relatively excellent performance in the early life cycleswithout clear degradation trend. It is worth noting that theproposed JMP is developed for the degradation process underthe same operating condition. When multiple operating con-ditions are mixed together, however, there may not be cleardegradation trends for the multisensor data, making the JPMnot applicable. The extension to multiple operating conditionswill be investigated in the future.

REFERENCES

[1] K. Liu, N. Z. Gebraeel, and J. Shi, “A data-level fusion model fordeveloping composite health indices for degradation modeling andprognostic analysis,” IEEE Trans. Autom. Sci. Eng., vol. 10, no. 3,pp. 652–664, Jul. 2013.

[2] M. S. Kan, A. C. C. Tan, and J. Mathew, “A review on prognostictechniques for non-stationary and non-linear rotating systems,” Mech.Syst. Signal Process., vols. 62–63, pp. 1–20, Oct. 2015.

[3] M. Pecht, “A prognostics and health management for informa-tion and electronics-rich systems,” in Engineering Asset Manage-ment and Infrastructure Sustainability. London, U.K.: Springer, 2012,pp. 317–323.

[4] Y. Peng, M. Dong, and M. J. Zuo, “Current status of machine prognosticsin condition-based maintenance: A review,” Int. J. Adv. Manuf. Technol.,vol. 50, nos. 1–4, pp. 297–313, Jan. 2010.

[5] Z.-S. Ye and M. Xie, “Stochastic modelling and analysis of degradationfor highly reliable products,” Appl. Stochastic Models Bus. Ind., vol. 31,no. 1, pp. 16–32, Oct. 2015.

[6] Y. Wen, J. Wu, and Y. Yuan, “Multiple-phase modeling of degradationsignal for condition monitoring and remaining useful life prediction,”IEEE Trans. Rel., vol. 66, no. 3, pp. 924–938, Sep. 2017.

[7] Y. Wen, J. Wu, D. Das, and T.-L. Tseng, “Degradation modeling andRUL prediction using Wiener process subject to multiple change pointsand unit heterogeneity,” Rel. Eng. Syst. Saf., vol. 176, pp. 113–124,Aug. 2018.

[8] Y. Wen, J. Wu, Q. Zhou, and T.-L. Tseng, “Multiple-change-pointmodeling and exact Bayesian inference of degradation signal for prog-nostic improvement,” IEEE Trans. Autom. Sci. Eng., vol. 16, no. 2,pp. 613–628, Apr. 2019.

[9] P. Lim, C. K. Goh, K. C. Tan, and P. Dutta, “Multimodal degradationprognostics based on switching Kalman filter ensemble,” IEEE Trans.Neural Netw. Learn. Syst., vol. 28, no. 1, pp. 136–148, Jan. 2015.

[10] K. Liu and S. Huang, “Integration of data fusion methodology anddegradation modeling process to improve prognostics,” IEEE Trans.Autom. Sci. Eng., vol. 13, no. 1, pp. 344–354, Jan. 2016.

[11] C. Song, K. Liu, and X. Zhang, “Integration of data-level fusion modeland kernel methods for degradation modeling and prognostic analysis,”IEEE Trans. Rel., vol. 67, no. 2, pp. 640–650, Jun. 2018.

[12] Z. Tian, L. Wong, and N. Safaei, “A neural network approach forremaining useful life prediction utilizing both failure and suspensionhistories,” Mech. Syst. Signal Process., vol. 24, no. 5, pp. 1542–1555,Jul. 2010.

[13] C. Zhang, P. Lim, A. K. Qin, and K. C. Tan, “Multiobjective deep beliefnetworks ensemble for remaining useful life estimation in prognostics,”IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2306–2318,Oct. 2016.

[14] J. Li, X. Mei, D. Prokhorov, and D. Tao, “Deep neural network forstructural prediction and lane detection in traffic scene,” IEEE Trans.Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 690–703, Mar. 2016.

[15] N. Gebraeel, M. Lawley, R. Liu, and V. Parmeshwaran, “Residual lifepredictions from vibration-based degradation signals: A neural networkapproach,” IEEE Trans. Ind. Electron., vol. 51, no. 3, pp. 694–700,Jun. 2004.

[16] M. Xia, T. Li, L. Liu, L. Xu, S. Gao, and C. W. de Silva, “Remaininguseful life prediction of rotating machinery using hierarchical deepneural network,” in Proc. IEEE Int. Conf. Syst., Man, Cybern. (SMC),Oct. 2017, pp. 2778–2783.

[17] Z. Tian, “An artificial neural network approach for remaining useful lifeprediction of equipments subject to condition monitoring,” in Proc. 8thInt. Conf. Rel., Maintainability Saf., Jul. 2009, pp. 227–237.

[18] A. K. Mahamad, S. Saon, and T. Hiyama, “Predicting remaining usefullife of rotating machinery based artificial neural network,” Comput.Math. Appl., vol. 60, no. 4, pp. 1078–1087, Aug. 2010.

[19] F. O. Heimes, “Recurrent neural networks for remaining useful lifeestimation,” in Proc. Int. Conf. Prognostics Health Manage., Oct. 2008,pp. 1–6.

[20] Y. Wu, M. Yuan, S. Dong, L. Lin, and Y. Liu, “Remaining useful lifeestimation of engineered systems using vanilla LSTM neural networks,”Neurocomputing, vol. 275, pp. 167–179, Jan. 2018.

[21] G. S. Babu, P. Zhao, and X.-L. Li, “Deep convolutional neural networkbased regression approach for estimation of remaining useful life,” inProc. Int. Conf. Database Syst. Adv. Appl. Cham, Switzerland: Springer,2016, pp. 214–228.

[22] X. Li, Q. Ding, and J.-Q. Sun, “Remaining useful life estimation inprognostics using deep convolution neural networks,” Rel. Eng. Syst.Saf., vol. 172, pp. 1–11, Apr. 2018.

[23] S.-J. Wu, N. Gebraeel, M. A. Lawley, and Y. Yih, “A neural network inte-grated decision support system for condition-based optimal predictivemaintenance policy,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans,vol. 37, no. 2, pp. 226–236, Mar. 2007.

[24] N. Z. Gebraeel and M. A. Lawley, “A neural network degradation modelfor computing and updating residual life distributions,” IEEE Trans.Autom. Sci. Eng., vol. 5, no. 1, pp. 154–163, Jan. 2008.

[25] A. Saxena, K. Goebel, D. Simon, and N. Eklund, “Damage propagationmodeling for aircraft engine run-to-failure simulation,” in Proc. Int.Conf. Prognostics Health Manage., Oct. 2008, pp. 1–9.

[26] C. J. Lu and W. O. Meeker, “Using degradation measures to estimate atime-to-failure distribution,” Technometrics, vol. 35, no. 2, pp. 161–174,May 1993.

[27] J. Wu, Y. Yuan, and X. Li, “Size distribution estimation of three-dimensional particle clusters in metal-matrix nanocomposites consid-ering sampling bias,” J. Manuf. Sci. Eng., vol. 139, no. 8, May 2017,Art. no. 081017.

[28] N. Gebraeel, “Sensory-updated residual life distributions for componentswith exponential degradation patterns,” IEEE Trans. Autom. Sci. Eng.,vol. 3, no. 4, pp. 382–393, Oct. 2006.

[29] N. Gebraeel, “Prognostics-based identification of the top-k units in afleet,” IEEE Trans. Autom. Sci. Eng., vol. 7, no. 1, pp. 37–48, Jan. 2010.

[30] A. Saxena et al., “Metrics for evaluating performance of prognostictechniques,” in Proc. Int. Conf. Prognostics Health Manage., Oct. 2008,pp. 1–17.

[31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” NeuralComput., vol. 9, no. 8, pp. 1735–1780, 1997.




Yuanyuan Gao received the B.S. degree in aircraftdesign and engineering from Beihang University,Beijing, China, in 2018. She is currently pursuingthe Ph.D. degree in industrial engineering and man-agement with Peking University, Beijing.

Her research interests are focused on data min-ing, advanced data analytics, quality, and reliabilityengineering.

Yuxin Wen received the B.S. degree in med-ical informatics engineering from Sichuan Univer-sity, Chengdu, China, in 2011, the M.S. degree inbiomedical engineering from Zhejiang University,Zhejiang, China, in 2014. She is currently pursuingthe Ph.D. degree in electrical and computer engi-neering with The University of Texas at El Paso, ElPaso, TX, USA.

Her research interests are focused on statisticalmodeling, prognostics, and reliability analysis.

Jianguo Wu (Member, IEEE) received the B.S.degree in mechanical engineering from TsinghuaUniversity, Beijing, China, in 2009, the M.S. degreein mechanical engineering from Purdue Univer-sity, West Lafayette, IN, USA, in 2011, and theM.S. degree in statistics and the Ph.D. degree inindustrial and systems engineering from the Uni-versity of Wisconsin–Madison, Madison, WI, USA,in 2014 and 2015, respectively.

He was an Assistant Professor with the Depart-ment of Industrial, Manufacturing and Systems

Engineering, The University of Texas at El Paso, El Paso, TX, USA, from2015 to 2017. He is currently an Assistant Professor with the Department ofIndustrial Engineering and Management, Peking University, Beijing, China.His research interests are focused on data-driven modeling, monitoring, andanalysis of advanced manufacturing processes and complex systems forquality control and reliability improvement.

Dr. Wu is a member of the Institute for Operations Research and Manage-ment Science (INFORMS), the Institute of Industrial and Systems Engineers(IISE), and the Society of Manufacturing Engineers (SME).


Date post:	23-Oct-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

A Neural Network-Based Joint Prognostic Model for Data ......A Neural Network-Based Joint Prognostic...

Documents