+ All Categories
Home > Documents > [IEEE Industrial Engineering (CIE39) - Troyes, France (2009.07.6-2009.07.9)] 2009 International...

[IEEE Industrial Engineering (CIE39) - Troyes, France (2009.07.6-2009.07.9)] 2009 International...

Date post: 18-Dec-2016
Category:
Upload: ariane
View: 214 times
Download: 1 times
Share this document with a friend
6
Fault Detection With An Adaptive Distance For The k-Nearest Neighbors Rule Ghislain Verdier, Ariane Ferreira Ecole Nationale Superieure des Mines de Saint-Etienne, Centre de Microelectronique de Provence, Gardanne, France ([email protected],[email protected]) ABSTRACT In recent years, fault detection has become a crucial issue for many industrial fields, notably the semiconductor manufacturing where process control engineers constantly try to improve the equipment productivity by detecting as quickly as possible an abnormal behavior. Due to the number of variables and the correlations between them in this type of applications, statistical methods dealing with fault detection need to be multivariate. Usually, the multivariate control chart procedures used in the industry derived from the Hotelling T 2 . However, this rule can only be used when the observations are generated by a Gaussian distribution, an assumption rarely satisfied in practice. An alternative consists to apply nonparametric control charts for which there is no assumption needed on the distribution. A nonparametric rule, the k-Nearest Neighbors Detection rule is studied in this paper. The approach consists in evaluating the distance of an observation to its nearest neighbors and declaring a fault if this distance is too large. In this paper, a new adaptive Mahalanobis distance is proposed. It takes into account the local correlation structure of the data and then improves the number of faults detected for a fixed false alarm rate, compared to a classic distance such as the Euclidean distance. Keywords: Adaptive Mahalanobis distance, Correlated variables, Hotelling T 2 , Multivariate methods, Statistical fault detec- tion, k-Nearest Neighbors rule. 1. INTRODUCTION Process control is crucial in many industries, especially in the semiconductor manufacturing. A major step of process control is Fault Detection and Classification (FDC). The aim of FDC is to construct a decision rule able to detect as quickly as possible an abnormal evolution of the system (for example a machine) in order to prevent more critical problems in the future. Statistical methods, like Multivari- ate Control Charts (MCC), are among the most widely used methods for the construction of such decision rules. Gen- erally, the problem is stated as follow: the statistical law of the observations under control is unknown or partially unknown. The goal is to get a decision rule from a learn- ing sample of data collected during normal operating mode. The decision rule must be able to detect if a new observa- tion comes from normal or fault mode. The most common approach supposes that the learning sample observations (the observations without fault) are generated from a multi- dimensional Gaussian distribution with unknown mean and variance. In this case, the Hotelling T 2 rule is the most ap- propriate solution to detect a fault characterized by a shift in the mean of the observations (see [7]). However, the sta- tistical law of the observations under control is often non- Gaussian. Therefore, the Hotelling T 2 is no longer adapted and leads to a lot of false alarms and non-detections. An al- ternative is to apply nonparametric detection rules. These approaches can be applied without assumption on the sta- tisticallaw generating the observations under control since the detection rules are implemented only with the learning sample. Among these methods, we can cite the rules stud- ied by Devroye and Wise [4] and Baillo et al. [2] based on a standard level set estimator, or by Baillo and Cuevas [1] in which a kernel estimator is used. In the literature, several papers deal with fault detection using one-class Support Vector Machines approaches (see for example Manevitz and Yousef [8]). More recently, He and Wang [5] propose 1978-1-4244-4136-5/09/$25.00 @2009 IEEE to adapt the classification rule of the k-nearest neighbors to the problem of Fault Detection (FD). The approach is to evaluate the distance of an observation compared to the normal operating region. The cumulative distance of this observation to it's k nearest neighbors in the learning sam- ple is calculated. If this distance is too high, the observation is considered as an out-of-control. He and Wang [5] apply this decision rule with the Euclidean distance. However, this distance is not always well adapted to a problem since it does not take into account the correlations between vari- ables. The aim of this paper is to propose a new distance for the k-Nearest Neighbors Detection rule (k-NND rule). It is an adaptive Mahalanobis distance based on the correla- tion structure of the nearest neighbors of each observation under monitoring. The paper is organized as follows: in section II, the prob- lem of fault detection is presented and the Hotelling T 2 rule is reminded. In section III, the k Nearest Neighbors Detection rule is detailed and the new adaptive distance is introduced. Hotelling T 2 and k-NND rule (with Euclidean and adaptive distances) are studied on simulation trials in section IV, and finally, section V gives conclusions on the proposed method. 2. THE FAULT DETECTION PROBLEM AND THE HOTELLING T 2 2.1. The context In the sequel, assume that Xl, ..., XN are the measure- ments of a system under control, constituting a learning sample. A measurement is a set of values taken by the pro- cess variables, supposed to be an R d random vector. The variables (Xi)i=l,ooo,N are independant and identically dis- tributed according to an unknown law La, with a probabil- ity measure denoted by Po. A new mesure comes, a vector X N + l , and the system is said to be out-of-control if the new observation is generated by a law Ll (-I- La). Since 1273
Transcript
Page 1: [IEEE Industrial Engineering (CIE39) - Troyes, France (2009.07.6-2009.07.9)] 2009 International Conference on Computers & Industrial Engineering - Fault detection with an adaptive

Fault Detection With An Adaptive Distance For The k-Nearest Neighbors Rule

Ghislain Verdier, Ariane Ferreira

Ecole Nationale Superieure des Mines de Saint-Etienne,Centre de Microelectronique de Provence, Gardanne, France

([email protected],[email protected])

ABSTRACT

In recent years, fault detection has become a crucial issue for many industrial fields, notably the semiconductor manufacturingwhere process control engineers constantly try to improve the equipment productivity by detecting as quickly as possible anabnormal behavior. Due to the number of variables and the correlations between them in this type of applications, statisticalmethods dealing with fault detection need to be multivariate. Usually, the multivariate control chart procedures used in theindustry derived from the Hotelling T 2

. However, this rule can only be used when the observations are generated by aGaussian distribution, an assumption rarely satisfied in practice. An alternative consists to apply nonparametric control chartsfor which there is no assumption needed on the distribution. A nonparametric rule, the k-Nearest Neighbors Detection rule isstudied in this paper. The approach consists in evaluating the distance of an observation to its nearest neighbors and declaringa fault if this distance is too large. In this paper, a new adaptive Mahalanobis distance is proposed. It takes into account thelocal correlation structure of the data and then improves the number of faults detected for a fixed false alarm rate, comparedto a classic distance such as the Euclidean distance.Keywords: Adaptive Mahalanobis distance, Correlated variables, Hotelling T 2 , Multivariate methods, Statistical fault detec­tion, k-Nearest Neighbors rule.

1. INTRODUCTION

Process control is crucial in many industries, especially inthe semiconductor manufacturing. A major step of processcontrol is Fault Detection and Classification (FDC). Theaim of FDC is to construct a decision rule able to detectas quickly as possible an abnormal evolution of the system(for example a machine) in order to prevent more criticalproblems in the future. Statistical methods, like Multivari­ate Control Charts (MCC), are among the most widely usedmethods for the construction of such decision rules. Gen­erally, the problem is stated as follow: the statistical lawof the observations under control is unknown or partiallyunknown. The goal is to get a decision rule from a learn­ing sample ofdata collected during normal operating mode.The decision rule must be able to detect if a new observa­tion comes from normal or fault mode. The most commonapproach supposes that the learning sample observations(the observations without fault) are generated from a multi­dimensional Gaussian distribution with unknown mean andvariance. In this case, the Hotelling T 2 rule is the most ap­propriate solution to detect a fault characterized by a shiftin the mean of the observations (see [7]). However, the sta­tistical law of the observations under control is often non­Gaussian. Therefore, the Hotelling T 2 is no longer adaptedand leads to a lot of false alarms and non-detections. An al­ternative is to apply nonparametric detection rules. Theseapproaches can be applied without assumption on the sta­tisticallaw generating the observations under control sincethe detection rules are implemented only with the learningsample. Among these methods, we can cite the rules stud­ied by Devroye and Wise [4] and Baillo et al. [2] based on astandard level set estimator, or by Baillo and Cuevas [1] inwhich a kernel estimator is used. In the literature, severalpapers deal with fault detection using one-class SupportVector Machines approaches (see for example Manevitzand Yousef [8]). More recently, He and Wang [5] propose

1978-1-4244-4136-5/09/$25.00 @2009 IEEE

to adapt the classification rule of the k-nearest neighborsto the problem of Fault Detection (FD). The approach isto evaluate the distance of an observation compared to thenormal operating region. The cumulative distance of thisobservation to it's k nearest neighbors in the learning sam­ple is calculated. Ifthis distance is too high, the observationis considered as an out-of-control. He and Wang [5] applythis decision rule with the Euclidean distance. However,this distance is not always well adapted to a problem sinceit does not take into account the correlations between vari­ables. The aim of this paper is to propose a new distancefor the k-Nearest Neighbors Detection rule (k-NND rule).It is an adaptive Mahalanobis distance based on the correla­tion structure of the nearest neighbors of each observationunder monitoring.

The paper is organized as follows: in section II, the prob­lem of fault detection is presented and the Hotelling T 2

rule is reminded. In section III, the k Nearest NeighborsDetection rule is detailed and the new adaptive distance isintroduced. Hotelling T 2 and k-NND rule (with Euclideanand adaptive distances) are studied on simulation trials insection IV, and finally, section V gives conclusions on theproposed method.

2. THE FAULT DETECTION PROBLEM ANDTHE HOTELLING T 2

2.1. The context

In the sequel, assume that Xl, ..., XN are the measure­ments of a system under control, constituting a learningsample. A measurement is a set of values taken by the pro-cess variables, supposed to be an Rd random vector. Thevariables (Xi)i=l,ooo,N are independant and identically dis­tributed according to an unknown law La, with a probabil­ity measure denoted by Po. A new mesure comes, a vectorX N + l , and the system is said to be out-of-control if thenew observation is generated by a law Ll (-I- La). Since

1273

Page 2: [IEEE Industrial Engineering (CIE39) - Troyes, France (2009.07.6-2009.07.9)] 2009 International Conference on Computers & Industrial Engineering - Fault detection with an adaptive

.co and .c1 are unknown, we decide that a change has oc­cured if the new observation is outside a tolerance regionSo defined by a probability of false alarm a such that

PO [XN+1 rt So] = a.

So is constructed from the learning sample (X i ) i = l ,... ,N .

2.2. Gaussian observations and Hotelling T 2

Suppose that the observations under control are generatedby a multidimensional Gaussian distribution with an un­known mean

Hotelling T 2 is defined by an ellipse (any observation in­side the ellipse is declared "in control") well suited to theobservations of the learning sample. Otherwise, in figure2 the variable X i are not generated by a Gaussian distri­bution. Therefore, the tolerance region obtained with theT 2 is clearly not adapted to the behavior of the systemunder control , and for example, a point with coordinatesx = (6,0), clearly atypical on the graph, would be declared" in control ".

- 4 - 2.

9 .45

It is therefore important to develop detection methods thatapply in the case of non-Gaussian distributions. If thereis no a priori information on the statistical law of the dataunder control , the best approach is to apply nonparametricmethods, only based on the observations of the learningsample. The approach of the k-NND rule discussed in thesequel is one of them.

Xil

Fig. 1: N=400 observations generated by a Gaussiandistribution and the tolerance region of Hotelling T 2 .

Fig. 2: N=400 observations generated by a non-Gaussiandistribution and the tolerance region of Hotelling T 2

.

++

Xl1

- 1

and

Unfortunately, the distribution .co is often non-Gaussian,notably in semiconductor industry where the distributionsunder control of most equipments can be multimodal withseveral functioning points. In this case, the Hotelling T 2 ,

assuming Gaussian distribution, is not adapted. It leads totoo many false alarms and non-detections.

In figure I, N = 400 observations are generated by aGaussian distribution. The tolerance region obtained by the

The Hotelling T 2 test statistic for the observation X N+1 isthen

When T 2 (X N+1) 2: TOCL> the system is declared out-of­control. By construction, the probability that T 2(X

N +1 )

is greater than TOC L ' whereas X N +1 has been generatedby .co, tends to the false alarm probability a (fixed by theuser) when N is large.

and an unknown variance-covariance matrix

1 ~ - - tS = N _ 1 L..)Xi - X)(Xi - X) ,

i=l

where yt is the usual notation for the transpose ofa matrixy.

2 - t 1 -T (XN+1) = (XN+1 - X) S - (XN+1 - X), (1)

and it is compared to an upper control limit defined by:

T? = (N - l)(N + l)d F (d N - d)UC L N(N - d) "', ,

where F",(d ,N - d) is the (1 - a)-quantile of the Fisherdistribution with parameters d and N - d (see Montgomery[7]).

The tolerance region can be defined with the Hotelling T 2

detection rule introduced in [6]. The mean and variance­covariance matrix are respectively estimated by :

1 N

x = N L Xii = l

1274

Page 3: [IEEE Industrial Engineering (CIE39) - Troyes, France (2009.07.6-2009.07.9)] 2009 International Conference on Computers & Industrial Engineering - Fault detection with an adaptive

3. THE k-NEAREST NEIGHBORS RULE

3.1. Presentation

Initially, the k-NND rule is a nonparametric supervisedclassification method. The supervised classification con­sists in predicting the unknown class or label ofan observa­tion (lor 0, healthy or sick, etc. for a binary classification)given a learning sample of labeled observations (see [3] foran overview). The k-nearest neighbors classification ruleattributes to an observation the label which has the major­ity among the k-nearest neighbors ofthis observation in thelearning sample. The idea is that if an observation is closeto a group with almost all the same label, this observationmust belong to the same class.

He and Wang [5] proposes to adapt this rule to the prob­lem of fault detection. In this case, on the other hand, thereis only one class represented in the learning sample, theclass of observations under control. The principle of thedetection method will be as follows: an observation un­der control will generally take its values in a near neigh­borhood of the learning sample. Then a new observationX N +1 is declared out-of-control if it is too far from thedata under control. In order to assess the distance betweenX N+ l and the observations under control (Xi)i=l,ooo,N, acumulative distance is calculated between X N + 1 and its knearest neighbors located in the learning sample:

k

D~(XN+l) == L D2(XN+l, X(j)),j=l

where k is a positive integer fixed by the experimenter,D is a distance (for example the Euclidean distance) andX(l), ...,X(k) are the k nearest neighbors of XN+l.

If X N + l is generated by £0, it is probably close to severalobservations of the learning sample. Then the cumulativedistance to its k nearest neighbors is small. If this cumula­tive distance is too high, XN+1 is declared out-of control.Therefore, it is necessary to determine a control limit forthe test statistic D~(.) given a false alarm rate Q. Contraryto the Hotelling T , the distribution of the test statistic cannot be rigorously expressed from a statistical point of view.The threshold is then chosen empirically from the learn­ing sample. The cumulative distance is evaluated on eachsample in the training data set. The set (D~(Xi))i=l,ooo,N

obtained is an empirical distribution of the distance D~ (.)given Xl, ... , XN. The control limit, denoted h, is definedas the (1 - a) empirical quantile of (D~(Xi))i=l,ooo,N,ie(1 - a) x 100% of the values (D~(Xi))i=l,ooo,N are lowerthan h. By construction, when N is large, the probabilityof false alarm satisfies:

PO[D~(XN+l) > hlXl, ..., XN] ~ a.

The Algorithm: Following He and Wang [5], the algo­rithm of the detection rule is divided in two parts: thefirst one is a preliminary step concerning the choice of thethreshold and the second one is the monitoring procedure.

Firstly, fix a positive integer k and choose a distance D onRd (for example, the Euclidean distance).

Part 1 - Threshold choice

1- For all i == 1, ... , N, find the k-nearest neighbors of Xiin the learning sample Xl, ... , Xi-I, X i+ l, ... , X N and cal­culate the cumulative distance:

k

d; == D~(Xi) == LD2(Xi,X(j)).j=l

2- The threshold is chosen as the (1- a) empirical quantileof the distance distribution:

h == d([N(l-a)))

where [N(l - a)] is the integer part of N(l - a) andd(l)' ..., d(N) are the order statistics of the sample.

Part 2 - Monitoring

1- Find the k-nearest neighbors of X N + l in the learningsample Xl, ... , X N and calculate the cumulative distance:

k

dN+l == D~(XN+l) == L D2(XN+l, X(j)).j=l

2- Apply the decision rule: if dN + l 2: h then X N + lis declared out-of-control and an alarm is triggered. IfdN+l < h, the system in under control and the previoustwo steps are repeated for the next observation XN+2.

This detection rule performs better than the Hotelling T 2

for non-Gaussian observations (see section IV). For a fixedfalse alarm rate, the number of detection of abnormal datais greater for the nonparametric rule than for the parametricone.

3.2. The new adaptive distance

So far we have not discussed about the choice of the dis­tance in detail. He and Wang [5] use the classical Euclideandistance defined by

trtv, y.) == (Y:o - y.)t(y:o - y.)"', J '" J '" J'

a distance for which all components of the vectors aretreated equally. However, it seems to us that it is neces­sary to use a distance more suited to data, a statistical dis­tance, which can take into account the covariances or cor­relations between variables. In Statistics, the correlationcoefficient between two variables (a quantity taking valuesin [-1; 1]) is used to study the linear relationship betweenthese variables: if the correlation is high (as it is the casein the data simulated in Figure 1), the variable X i 2 tends toincrease when XiI increases (on the contrary, when the cor­relation between two variables is low (negative), a variabledecreases when the other increases). Therefore, in Figure1, a point with coordinates Xl == (1, -1) is statisticallycloser to the mean of the distribution (here, JL == (3,1))than a point with coordinates X2 == (5, -1), while the eu­clidean distance between these two points and JL would bethe same.

In the Hotelling T 2 rule, when the distribution of the ob­servations is Gaussian, the correlation between two com­ponents is taking into account in the test statistic. Indeed,

1275

Page 4: [IEEE Industrial Engineering (CIE39) - Troyes, France (2009.07.6-2009.07.9)] 2009 International Conference on Computers & Industrial Engineering - Fault detection with an adaptive

(3)

in the Hotelling T 2, the Mahalanobis distance, which is

based on the variance-covariance matrix of the variables(or its estimation), is calculated between the variable of in­terest and the mean ofthe distribution (or its estimation), asin equation (1). Therefore, in the example of Figure 1, thepoint Xl == (1, -1) is in the tolerance region (in the controlellipse) whereas a point X2 == (5, -1) is out-of-control.

For Gaussian distributions, the Mahalanobis distance isthen the most efficient distance for the k-NND rule. Theresults obtained are close to the Hotelling T 2 . For non­Gaussian distributions, we want to use the k-NND rule witha distance that takes into account the patterns of the data,notably the possible correlations, in order to have better re­sult than a k-NND rule applied with Euclidean distance.The Mahalanobis distance is no longer relevant without theGaussian assumption. For example in Figure 2, the obser­vations (clearly non-Gaussian) are separated in two parts:a first part with a high negative correlation and an otherwith a high positive correlation. The estimated variance­covariance matrix S calculated on the totality of the obser­vations indicates that the correlation between the two com­ponents XiI and X i 2 is close to zero (that is the reason whythe ellipse obtained with the Hotelling T 2 is parallel to theXiI -axis). Ideally, each part should be studied separatelyin order to avoid a smoothing when the covariance is esti­mated on the whole sample. This is the idea developed inthe new approach proposed in this paper. We want to studylocal correlation sructures of the learning sample. For anobservation X N + l under monitoring, the method consistsin evaluating the correlation structure of the K nearest ob­servations ofX N + l in the learning sample, and then apply­ing the k-NND rule with a Mahalanobis distance, used withthe covariance matrix estimated on the K observations. Itis therefore an adaptive distance since, for each new dataunder monitoring, a different Mahalanobis distance will beused.

The Algorithm: Like in section 3.1 the algorithm of thek-NND rule with the adaptive distance could be dividedin two parts: the first one for the threshold choice and thesecond one for the monitoring procedure.

Firstly, fix positive integers K and k, with k ~ K.

Part 1 - Threshold choice

1- For all i == 1, ... , N, find the K -nearest neighbors (rela­tively to the Euclidean distance) of Xi in the learning sam­ple Xl, ... ,Xi-l,Xi+ l, ... ,XN and note these neighborsX(l)" ..,X(K).

2- Estimate the covariance structure S K (Xi) onX(l)' ... ,X(K) by:

(2)

where X K is the empirical mean of the K observationsX(l),···,X(K).

3- Find the k-nearest neighbors (relatively to the Maha­lanobis distance used with SK(Xi)) of Xi in the learningsample and calculate the cumulative distance of Xi to its

k-nearest neighbors:

k

di L(Xi - X(j))t X

j=l

SK(Xi)-l(Xi - X(j)),

where X(l), ...,X(k) are the k neighbors.

4- The threshold is chosen as the (1 - a) empirical quantileof the distance distribution:

h == d([N(l-a)])

where [N(l - a)] is the integer part of N(l - a) andd(l)' ... , d(N) are the order statistics of the sample.

Part 2 - Monitoring

1- Find the K -nearest neighbors (relatively to Euclideandistance) of XN+l in the learning sample and note theseneighbors X(l)' ... , X(K).

2- Estimate the covariance structure SK(XN +l) onX(l)' ... , X(K) like in equation (2).

3- Find the k-nearest neighbors (relatively to the Ma­halanobis distance used with SK(XN +l)) of X N + l inthe learning sample and calculate the cumulative distancedN +1 of XN +1 to its k-nearest neighbors like in equation(3).

4- Apply the decision rule: if dN+l 2:: h then X N + l isdeclared out-of-control. If dN + l < h, the system in un­der control and the previous steps are repeated for the nextobservation X N +2 .

In the method proposed above, there are two parametersto choose: k and K. The most important is probably thesecond one since it determines the number of observationsused for the covariance structure estimation. There is noreal rule to choose this parameter and this choice must bemade on a case by case basis, depending on the numberof observations in the learning sample and the general ap­pearance of these data. The parameter K must be largeenough to ensure a good estimation but not too large sincethe estimation must be local. Note that when K == N,the covariance is estimated on the whole sample, and theadaptive distance simply becomes the classic Mahalanobisdistance.

Computation time: The computations required for theuse of the adaptive distance are obviously a little longerthan those of the classical approach since it is necessary toperform an additional step : determination of K neighborsand estimation of a variance-covariance matrix on these Kobservations. But the procedure is still fast enough to beapplied "on-line".

Broadly speaking, the k-NND approach is easily applica­ble as long as the number of variables is reasonable (forexemple d < 6). When the number of variables is too high,a method for dimension reduction, e. g. a PCA (PrincipalComponent Analysis), is recommended.

1276

Page 5: [IEEE Industrial Engineering (CIE39) - Troyes, France (2009.07.6-2009.07.9)] 2009 International Conference on Computers & Industrial Engineering - Fault detection with an adaptive

Decision rulesk-NND rule (k = 5)

N=250 Hotellling Euclidean Mahalanobis Adaptative distanceT2 distance distance K=20 K=50 K=80 K=1I0 K=140 K=170

Mean 27.28 49.34 39.67 47.56 53.67 55.91 55.72 54.14 49.53Variance 41.89 873.74 629.06 542.74 637.42 670.08 665.22 695.89 722.67

Decision rulesk-NN D rule (k = 5)

N=500 Hotellling Euclidean Mahalanobis Adaptative distanceT2 distance distance K=30 K=70 K=100 K=130 K=160 K=190

Mean 27.23 73.95 67.20 67.82 73.30 74.96 75.59 75.90 75.72

Variance 23.10 383.30 384.66 286.54 255.71 245.3 1 246.53 248.58 261.58

Decision rulesk-NND rule (k = 5)

N=1000 Hotellling Euclidean Mahalanobis Adaptative distanceT2 distance distance K=50 K=140 K=230 K=320 K=41O K=500

Mean 29.40 82.40 78.84 77.11 81.02 82.32 82.56 81.89 81.28

Variance 14.94 122.37 161.44 99.90 99.45 88.64 91.63 106.75 81.28

Tab. 1: Mean and Variance ofthe number of detections for Hotelling T 2 and k-NND rules, for N = 250, N = 500 andN = 1000.

4. EXAMPLES AND APPLICATIONS

In this section, simulation trials are performed to comparethe new adaptive distance with the Euclidean and Maha­lanobis distances for the k-NND rule. We consider a learn­ing sample ofN observations generated as shown in Figure2 : N /2 observations are generated by a Gaussian distribu­tion with parameters:

i " ( 0.7 0.5)ant L.l = 0.5 0.7

and N / 2 observations are also generated by a Gaussian dis­tribution , with parameters:

112 = ( ~) et ~2 = ( ~O:5 O~75 ) .

In order to compare the different fault detection methods,a false alarm rate is fixed (ex = 1%) and the decisionrules are all implemented with respect to this rate. Eachrule is then evaluated on the number of faults detected.100 observations out-of-control are simulated outside thetheoritical control limits (see Figure 3). Obviously, if afault simulated consists in a high shift, the k-NND ruledetects the change whatever the distance used. That isthe reason why the fault trajectories are simulated nearthe theoritical tolerance region. The change is then moredifficult to detect. The number of detections is comparedfor the following rules:- Hotelling T 2

- the k-NN D rule (k = 5) with Euclidean distance- the k-NN D rule (k = 5) with Mahalanobi s distance

++.p- 4!;

~37+.. ~+ +

f

-II-+*

+\t

- 1

- 3 - 2 - 1

Xil

Fig. 3: 100 observations out-of-control similated outsidethe theoritical control limits.

1277

Page 6: [IEEE Industrial Engineering (CIE39) - Troyes, France (2009.07.6-2009.07.9)] 2009 International Conference on Computers & Industrial Engineering - Fault detection with an adaptive

- the k-NND rule (k == 5) with the new adaptive distance(several values for K).

The previous simulation is repeated. Table 1 represents themean and variance of the number ofdetections for 200 rep­etitions and different values for the learning sample sizeN. As expected, Hotelling T 2 rule gives the worst resultssince less than one in three out-of-control trajectory is de­tected. In the same way, Mahalanobis distance is not veryefficient. More interesting, when the parameter K is wellchosen, the new adaptive distance performs better than theEuclidean distance, notably when the learning sample sizeis quite small. In addition, the variance ofthe number ofde­tections is smaller with the adaptive distance, that reflects akind of stability of the method. We can expect to improvethese results by working on the joint selection of k and K.

5. CONCLUSIONS AND FUTURE WORK

We propose here a new adaptive distance for the k near­est neighbors detection rule (k-NND rule) proposed by Heand Wang [5]. The k-NND rule is a nonparametric methodfor fault detection. Based on the learning sample only, thistype of methods has a great interest in industry, where theGaussian assumption is seldom satisfied. The new distanceproposed in this paper is an adaptive Mahalanobis distanceand is based on local correlation structure estimations. Thefirst simulation results are convincing since the k-NND ruleperforms better when it is applied with the new adaptivedistance than when the Euclidean distance is used. It isnow important to study in more detail the choice of the pa­rameters k and K used in the adaptive distance. It dependsmainly on the learning sample: the sample size N and thepossible local correlation structures.

In this paper, the problem is only to decide if one obser­vation XN+l is in control or out-of-control. But as forthe Hotelling T 2

, the k-NND rule can be applied com­bined with an Exponentially Weighted Moving-Average(EWMA) procedure in order to be more efficient for detect­ing, as rapidly as possible, a change in a dynamic process.

Future works will be to implement this kind of approacheson a semiconductor manufacturing process and to study thetheoritical properties of these rules.

REFERENCES

[1] A. Baillo and A. Cuevas, "Parametric versus nonpara­metric tolerance regions in detection problems," Com­putational Statistics, vol. 21 (3-4), pp. 523-536, 2006.

[2] A. Baillo, A. Cuevas, and A. Justel, "Set estimation andnonparametric detection," Canadian Journal ofStatis­tics, vol. 28, pp. 765-782,2000.

[3] L. Devroye, L. Gyorffi, and G. Lugosi, A Probabilis­tic Theory ofPattern Recognition. Springer, Berlin,1996.

[4] L. Devroye and G. L. Wise, "Detection of abnormalbehavior via nonparametric estimation of the support,"SIAMJournal on AppliedMathematics, vol. 38 (3), pp.480-488,1980.

[5] Q. P. He and J. Wang, "Fault detection using the k­nearest neighbor rule for semiconductor manufacturing

processes," IEEE Trans. Semiconduct. Manufact., vol.20 (4), pp. 345-354, Nov. 2007.

[6] H. Hotelling, "Multivariate quality control, illustratedby the air testing of sample bombsights," in Techniquesofstatistical analysis, C. Eisenhart, M. W. Hastay, andW. A. Wallis, Eds. New York: McGraw-Hill, 1947,pp. 111-184.

[7] D. Montgomery, Introduction to Statistical QualityControl. Wiley, New York, 1996.

[8] L. M. Manevitz and M. Yousef, "One-class SVMs fordocument classification," Journal ofMachine LearningResearch, vol. 2, pp. 139-154,2001.

1278


Recommended