Machine-Learning Methods for Classification of ... · Machine-Learning Methods for Classification...

Machine-Learning Methods for

Classification of Semiconductor

Defects Shing Chiang Tan

Multimedia University, Malaysia

[email protected]

At Graduate School of Information, Production and Systems Waseda University, 12 Nov. 2012

Acknowledgement

This work is an outcome from a research collaboration with Prof. Watada of Waseda University, Prof. Marzuki Khalid and Dr. Zuwarie of Universiti Teknologi Malaysia. All researchers are grateful to Intel Malaysia for providing real data without which this work would never have commenced.

Outline

• Introduction: Semiconductor Manufacturing & Wafer Defect Detection

• Imbalanced Data, Problem, Issues and Classification Metrics

• Machine-Learning Methods and Results

• Summary

A. Introduction:

Semiconductor Manufacturing

Operation – How a wafer is produced

and tested?

Fabrication Sort Assembly Class test

Quality Cost Savings

Yield

Voluminous Data from Production

Process

Correlation Analysis

Alternative:

Machine-Learning Methods

•conducted manually •time consuming •complicated

•learning information from data •automatic defect detection

(Between-Class) Imbalanced Data

Small number of records (defective cases), minority

concept

vs.

Large number of records (non-defective ), majority concept

Unfavorable accuracies

Nature of the Imbalanced Data

Problem Dataspace

Nature

Intrinsic

(directly related)

Extrinsic

(indirectly related)

e.g., time interval for data acquisition

Data Quantity

Imbalance due to rare instances/absolute rarity

Relative imbalance

Imbalanced Data Complexity

Overlapping data from different classes

Lack of representative data rare instances are limited

Small disjuncts of data – either noise/outliers OR useful information

Imbalanced Data Complexity

Figure adapted from He and Garcia (2009)

Data-Driven Wafer Defect Detection:

Encrypted Dataset from Intel Dataset Attribute 1

(categorical)

Attribute 2 (numerical)

Attribute 3 (numerical)

Output (numerical)

Training Set

1A to 215A 174610–261970

174650–278140

0/1

Test Set 1A to 215A

180800–299370

176670–301770

0/1

Interpolate Rate of Test Data (%)

100 44.66 93.06 100

Extrapolate Rate of Test Data (%)

0 55.34 6.94 0

•AN OVERVIEW OF AN ENCRYPTED SEMICONDUCTOR DATASET

Encrypted Dataset Distribution

Data Classification: Confusion Matrix

Classification Metric

Definition

Correctly classified rate (CCR)

True positive rate (TPR)

True negative rate (TNR)

Geometric mean (G-mean)

Predicted Positive (Label=0)

Predicted Negative (Label=1)

Actual Positive (Label=0)

True Positive (TP) False Negative (FN)

Actual Negative (Label=1) False Positive (FP) True Negative (TN)

TNFNFPTP

TNTPCCR

FNTP

TPTPR

FPTN

TNTNR

TNRTPRmean-G


Imbalanced Data

Classification

Algorithm-Level Approach

Modified learning

algorithms

Kernel-based learning

Data-Level Approach

Sampling Methods: Over-, under-sampling

Synthetic Sampling (SMOTE)

Cost Sensitive Approach

Bootstrap sampling

Cost-sensitive function in

learning

Note: SMOTE – Synthetic Minority Oversampling TEchnique


Imbalanced Data

Classification

Algorithm-Level Approach

Modified learning

algorithms

Kernel-based learning

Data-Level Approach

Sampling Methods: Over-, under-sampling

Synthetic Sampling (SMOTE)

Cost Sensitive Approach

Bootstrap sampling

Cost-sensitive function in

learning

Additional Method

Testing stage of classification

Imbalanced Learning: Algorithm-Level

Evolutionary FAM and Evolutionary FAMDDA

Fuzzy ARTMAP

with Dynamic Decay

Adjustment (FAMDDA)

Hybrid Genetic

Algorithms

Fuzzy ARTMAP

(FAM)

Multilayer Perceptron

(MLP)

Evolutionary Programming

EPNet

Method A Method B

……

a b

ab

Input vector, a Target vector, b

Map Field

vigilance

ARTa

vigilance

ARTb

vigilance map field

Complement Coding

……

A = ( a, 1 – a )

……

……

B = ( b, 1 – b )

……

abw

Fuzzy ARTMAP (FAM)

(Carpenter et al, 1992)

ARTa ARTb

wa wb

Flow Chart of FAM Learning Process

Initialization FAM Weights

Complement Coding

Input Patterns, a

Choice Function, T

ART Vigilance Test

MAP Field Vigilance Test

pass

Learning

pass

ab

a

fail

fail

Winner

Process: Category selection, test, search

Resonance

Fuzzy ARTMAP with Dynamic Decay

Adjustment Algorithm (FAMDDA)

Cover

Commit

Shrink

• include a new training pattern into existing of FAM

• introduce a new prototype

• if a new pattern is incorrectly classified by an existing prototype of different class, the width of this prototype is reduced to overcome conflict.

Flow Chart of FAMDDA Learning Process

Initialization FAMDDA Weights

Complement Coding

Input Patterns, a

Choice Function, T

ART Vigilance Test

MAP Field Vigilance Test

pass

Learning with

prototypes’ width

adjustment

pass

ab

a

fail

fail

Winner

Process: Category selection, test, search

Resonance

Evolutionary FAMDDA/FAM with Hybrid

Genetic Algorithms (GAs) (Baskar et al,

2001)

Ph

as

e I

GA Search

Search for near-optimum feasible solutions with GA.

Ph

as

e I

I Local Search

Fine-tune the selected feasible solution (phase I). Direct search algorithm to reduce the size of search region.

Network Environment

Evolutionary Environment

GA search

Local search

EPNet (Yao, 1999)

Evolving Feedforward neural network performs adaptation in terms of learning and evolution

Evolve architecture and connection weights using evolutionary programming

5 mutation operators

The Construction of EPNet

Figure adapted from Yao (1999).

Algorithm-Level Classification: Results

•performance comparison with other classification methods

Model CRR TPR TNR G-mean

KNN 91 95 43 64

SVM-RBF 75 74 83 79

EPNet 80.34 80.18 82.30 81.84

FAM-HGA 87.16 87.79 79.42 83.50

FAMDDA-HGA 88.69 89.75 75.73 82.44

Additional Method: Testing Stage of

Classification

Rule-Based Classifiers

FAMDDA-FIM

and

FAM-FIM

Rectangular Basis Function

Network (RecBFN)

NEFCLASS

Note: FIM – Fuzzy Inference Mechanism

FAMDDA-FIM and FAM-FIM

Knowledge base formation from learning with FAMDDA/FAM

Knowledge base extraction from a trained network,

Reasoning process

Ljc jaj ,,2,1 , w

Reasoning Process in a Trained

FAMDDA/FAM

Determining output. By a weighted sum of all rules’ firing strengths.

Aggregating firing strength of all rules. The activation levels of the rules from different classes are aggregated by an additive combination.

Matching degree. Calculate firing strength of the antecedent of each rule, associate directly to its consequent class.

Rectangular Basis Function Network

(RecBFN) (Huber and Berthold, 1995)

A method to learn hyper-rectangles (rules) directly from data.

Applies a constructive learning algorithm (Dynamic Decay Adjustment algorithm)

Hyper-rectangles translated directly as rules.

Rectangular Basis Function Network

(RecBFN)

• The output layer computes a weighted sum of the activations of the RecBF units.

Figure adapted from Berthod and Huber (1995)

Neuro Fuzzy CLASSification (NEFCLASS)

(Nauck and Kruse, 1997)

A neuro-fuzzy classifier: A 3-layer (input/rule/output) fuzzy perceptron + backpropagation algorithm.

Learns the shape of membership functions (fuzzy sets )

Train with prior knowledge/from scratch with data.

NEFCLASS If x1 is µ1 and x2 is µ2

and… and … xn is µn then the pattern (x1, x2, …, xn) belongs to ci

Testing Stage of Classification: Results

Model CRR TPR TNR G-mean

KNN 91 95 43 64

SVM-RBF 75 74 83 79

RecBFN 8.90 86.83 93.85 1.30

NEFCLASS 5.14 4.12 0.81 44.60

FAM-FIM 89.08 90.13 76.20 82.87

FAMDDA-FIM 87.18 87.78 79.78 83.69

0

10

20

30

40

50

60

70

80

90

100

CCR

TPR

TNR

G-Mean

Summary

• Current work: some machine-learning methods based on artificial neural networks, rule-based system and evolutionary algorithms.

• Further improvements in classification performances with other machine-learning methods (algorithm-level approach).

• Future work: boosting algorithm + machine-learning model

References

• Baskar, S., Subraraj, P., and Rao, M. V. C. (2001). Performance of hybrid real coded genetic algorithms. International Journal of Computational Engineering Science, 2, 583-601.

• Berthold, M. R., and Diamond, J. (1998). Constructive training of probabilistic neural network. Neurocomputing, 19, 167-183.

• Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., and Rosen, D. B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analogue multidimensional maps. IEEE Transactions on Neural Networks, 3, 698-713.

• He, H. and Garcia, E.A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263 – 1284.

• Huber, K. -P., and Berthold, M. R. (1995). Building precise classifiers with automatic rule extraction. Proceedings of the IEEE International Conference on Neural Networks, 3, 1263-1268.

• D. Nauck and R. Kruse (1997). A neuro-fuzzy method to learn fuzzy classification rules from data. Fuzzy Sets and Systems, 89, 277–288.

• Yao, X. (1999). Evolving artificial neural networks. Proceedings of IEEE, 87, 1423 – 1447.

Date post:	31-Aug-2019
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

Machine-Learning Methods for Classification of ... · Machine-Learning Methods for Classification...

Documents