IFAC MIM 2013

[email protected] http://www.flll.jku.at/staff/francisco

Francisco Serdio, Edwin Lughofer, Kurt Pichler, Thomas Buchegger, Hajrudin Efendic

Condition Monitoring at Rolling Mills

with Data-Driven Residual-Based Fault Detection

Francisco Serdio Fernández

Department of Knowledge-Based Mathematical Systems

Johannes Kepler University

Linz - Austria



Index

• Residual Based Approach• Framework

» Data Cleaning» System Identification» Model Training » Model Testing

• Reference Method» Principal Component Analysis – PCA» Multi Scale Principal Component Analysis – MSPCA



Index

• Current Challenges» Global approaches» Fixed thresholds

• Artificial faults» Constant Failure» Drift Failure

• Results» ROC Curves» Detection Rates

• Conclusions • Outlook



Basic Idea of Residual-Based Approach

Fault No Fault!, but non-smooth pattern of signal

Joint Channel Space (smooth dependency)

Increasing the dimensionality of the joint channel space decreases the likelihood that a fault is affected in all channels with same intensity and direction!



Framework



Framework

• Off-line stage» Data cleaning

– Produces a new dataset to be used in the following step

» System identification– Iterative process Identifies which channels explain others

» Model training– Produces a model for each previously identified system

• On-line stage» Model testing

– Determines when there is a fault in the running system



• Remove constant channels

» Constant?

• Remove binary channels

» Binary?

• Remove duplicated channels

» Duplicated? R2 greater than 0.95

• Remove outliers

» Outlier? pairwise distance in the training data outlier degree

• Downsample data set

» Keep the shape of the channel

Framework – Data cleaning



• Identify channel dependencies» Forward selection with orthogonalization

– Achieves channel ranking according to their importance level for explaining target (most important first)

» GA based feature selection (included in Box-Cox)– Outputs individuals with 1’s and 0’s indicating whether a

variable is included or not

• Determine optimal number of dimensions in ranking scheme » Find a knee in the cumulative quality sum curve

– Automatically determine by means of gradient– Keeps the inputs modelling the useful information – Discards the inputs modelling the noise

Framework – System Identification



• Determine optimal number of dimensions

Framework – System Identification



• Models applied, stepwise increasing non-lin. deg.» Ridge Regression (linear)

“T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction - Second Edition. Springer, New York Berlin Heidelberg, 2009“

– Global MLR with Tichonov regularization included

» Genetic Box-Cox (slightly non-linear)”R.M. Sakia. The Box-Cox transformation technique: a review. The Statistician, 41:168--178, 1992.“

– Combining original Box-Cox with GA Transform the inputs to introduce slight non-linearities

Use linear regression over the transformed inputs

The transformations are learnt using a GA

» SparseFIS (highly non-linear)“E. Lughofer and S. Kindermann. SparseFIS: Data-driven learning of fuzzy systems with sparsity constraints. IEEE Transactions on Fuzzy Systems, 18(2): 396--411, 2010.“

– Top down fuzzy modeling approach applying numerical sparsity constraints optimization, out-weighting unimportant rules and parameters

– Employing iterative VQ, projected gradient descent and Semi-Smooth Newton

Framework – Model Training



Example of Input Transformations



Overview of Training Methods

Method Type Training effort

System Identification Model Training

Linear Regression Linear Low Forward selection 10-fold cv with mse

Box-Cox Slightly non-linear Medium Genetic algorithm 10-fold cv with mse

SparseFIS Highly non-linear High Forward selection 10-fold cv with mse

+ grid search



• Computation of residuals

» The residuals are the differences between the observed values and the predicted ones

• Computation of error bars

» Two types: global and local– Global: based on CV model error a unique value for each point in

the testing data set is provided

– Local: based on adaptive confidence intervals according to variation in the data distribution over space a different value for each point in the testing data set is provided

• Combine residuals and error bars

» The error bars are used to normalize the residuals– The residuals are now expressed in error bar units

On-line Analysis of Residual Signals



• On-line tracking of the residuals

» The average μ and the standard deviation σ is tracked – A window of time is used values out of the tolerance band trigger

a fault alarm and do not update the tracking

On-line Analysis of Residual Signals

Current residual at time instance k generated from the ith model

Incremantal /Decremental μ and σ over

sliding window with size T



Dynamic Residual Signals Analysis - Example

Fault with 50% level

Fault with 10% level



• Principal Components Analysis – PCA» State of the art in fault detection

– D. Garcia-Alvarez. Fault detection using principal component analysis (pca) in a wastewater treatment plant (wwtp). In Proceedings of the 62-th Int. Student's Scientic Conference, 13-17, Saint-Peterburg, Russia, 2009.

– P.F. Odgaard, B. Lin, and S.B. Jorgensen. Observer and data-driven-model-based fault detection in power plant coal mills. IEEE Transactions on Energy Conversion, 23(2): 659-668, 2008.

» The monitoring can be reduced to two variables (T2 and Q) characterizing two orthogonal subsets of the original space

– T-Hotelling (T2) represents the major variation in the data

– Q represents the random noise in the data

Reference methods



• Multi Scale Principal Components Analysis – MSPCA» State of the art in process monitoring

– B.R. Bakshi. Multiscale pca with application to multivariate statistical process monitoring. AIChE Journal, 44, 1596-1610, 1998.

» It uses wavelets to reconstruct the original signal – Reconstruction attempt to remove useless information from the

signal, mainly noise

» Monitoring uses the same statistics as in PCA– T-Hotelling (T2) represents the major variation in the data

– Q represents the random noise in the data

Reference method (cont’d)



Current Challenges

• Global approaches

» PCA and MSPCA uses the dataset as a whole

» When new channels are added or removed to the system, the method should be trained again

– Low cascadability

• Fixed thresholds

» PCA and MSPCA uses a fixed threshold based on training data– Does not take into account train and test dataset differences

– When train and test differs considerably, the appoach becomes useless

» It’s a rigid approach– The threshold remains unchanged during the online operation of the

system



• Artificial faults were introduced in the data

» Regions where channels values are zero were ignored

• Different fault types with different intesities

» Fault types– Constant failure

– Means a jump in the original signal– Drift failure

– Means a progressive increase in the original signal– Different slopes → different shapes

» From exponential to logarithmic

» Fault intesities (% added to the original signal)– 5%, 10%, 20%, 50%, 100%

• Introduction of faults was shuffled 10 times to avoid unlucky situations (due to a bad coverage of faulty channels)

Artificial Faults



Artificial Faults Examples



Results

• ROC Curves» For sensibility analysis facing true positives vs. false

positives Detection vs. Overdetection

» Depict the following useful information– How much the detection rate influences the overdetection rate– How much sensible is the method to its parameters– Which method is best

– A higher AUC (Area Under the Curve) points to a better method, as higher detection rates (y-axis, values far from x-axis) can be achieved with lower false alarm rates (x-axis, values close to y-axis).



Results – Multi Scale PCA

• Shows to be useless for our problem» The wavelet reconstruction is not able to reconstruct

the signals properly– Poor channel reconstruction– The percentage of channels reconstructed using the wavelets,

with accuracy greater or equal to 90% is around 55% to 65% of the total number of channels for all the datasets

– Noise is introduce during the channel reconstruction, even in the channels reconstructed with good quality

» Inacceptable overdetection rates in all the datasets– The method is not able to operate below 10% overdetection

rate useless in our problem



Results – Multi Scale PCA



Results – ROC Curves – Scenario 1









Results – Detection Rates - Scenario 1









Statistical preference of methods• Two statistical tests using

» (i) Rankings / (ii) Absolute detection rates

– Plus denotes significant superiority over the other methods

– Minus denotes inferiority to the other methods

– 0 indicates no difference

– na indicates not applicable



• MSPCA is not applicable in our problem

• PCA is either not applicable or outperformed by our residual-based approach

• In the pessimistic (real-world) case, Box-Cox showed best performance, thus favoring slight non-linearities in the models

• A significant performance boost over pessimistic case could be recognized for all models times» Fault misses can be largely explained by having not a (good) model

available for a channel where a fault occurs!

Conclusions



Outlook• Deal with the non-stable behaviour of the residuals

(enhanced pattern analysis, model update schemes)

• Deal with the data from different products(probably operator’s feedback required)



Thanks a lot for your attention!



















Date post:	27-Jul-2015
Category:	Presentations & Public Speaking
Upload:	francisco-serdio
View:	129 times
Download:	0 times

IFAC MIM 2013

Presentations & Public Speaking