Machine-Learning Methods for
Classification of Semiconductor
Defects Shing Chiang Tan
Multimedia University, Malaysia
At Graduate School of Information, Production and Systems Waseda University, 12 Nov. 2012
Acknowledgement
This work is an outcome from a research collaboration with Prof. Watada of Waseda University, Prof. Marzuki Khalid and Dr. Zuwarie of Universiti Teknologi Malaysia. All researchers are grateful to Intel Malaysia for providing real data without which this work would never have commenced.
Outline
• Introduction: Semiconductor Manufacturing & Wafer Defect Detection
• Imbalanced Data, Problem, Issues and Classification Metrics
• Machine-Learning Methods and Results
• Summary
A. Introduction:
Semiconductor Manufacturing
Operation – How a wafer is produced
and tested?
Fabrication Sort Assembly Class test
Quality Cost Savings
Yield
Voluminous Data from Production
Process
Correlation Analysis
Alternative:
Machine-Learning Methods
•conducted manually •time consuming •complicated
•learning information from data •automatic defect detection
(Between-Class) Imbalanced Data
Small number of records (defective cases), minority
concept
vs.
Large number of records (non-defective ), majority concept
Unfavorable accuracies
Nature of the Imbalanced Data
Problem Dataspace
Nature
Intrinsic
(directly related)
Extrinsic
(indirectly related)
e.g., time interval for data acquisition
Data Quantity
Imbalance due to rare instances/absolute rarity
Relative imbalance
Imbalanced Data Complexity
Overlapping data from different classes
Lack of representative data rare instances are limited
Small disjuncts of data – either noise/outliers OR useful information
Imbalanced Data Complexity
Figure adapted from He and Garcia (2009)
Data-Driven Wafer Defect Detection:
Encrypted Dataset from Intel Dataset Attribute 1
(categorical)
Attribute 2 (numerical)
Attribute 3 (numerical)
Output (numerical)
Training Set
1A to 215A 174610–261970
174650–278140
0/1
Test Set 1A to 215A
180800–299370
176670–301770
0/1
Interpolate Rate of Test Data (%)
100 44.66 93.06 100
Extrapolate Rate of Test Data (%)
0 55.34 6.94 0
•AN OVERVIEW OF AN ENCRYPTED SEMICONDUCTOR DATASET
Encrypted Dataset Distribution
Data Classification: Confusion Matrix
Classification Metric
Definition
Correctly classified rate (CCR)
True positive rate (TPR)
True negative rate (TNR)
Geometric mean (G-mean)
Predicted Positive (Label=0)
Predicted Negative (Label=1)
Actual Positive (Label=0)
True Positive (TP) False Negative (FN)
Actual Negative (Label=1) False Positive (FP) True Negative (TN)
TNFNFPTP
TNTPCCR
FNTP
TPTPR
FPTN
TNTNR
TNRTPRmean-G
Machine-Learning Methods
Imbalanced Data
Classification
Algorithm-Level Approach
Modified learning
algorithms
Kernel-based learning
Data-Level Approach
Sampling Methods: Over-, under-sampling
Synthetic Sampling (SMOTE)
Cost Sensitive Approach
Bootstrap sampling
Cost-sensitive function in
learning
Note: SMOTE – Synthetic Minority Oversampling TEchnique
Machine-Learning Methods
Imbalanced Data
Classification
Algorithm-Level Approach
Modified learning
algorithms
Kernel-based learning
Data-Level Approach
Sampling Methods: Over-, under-sampling
Synthetic Sampling (SMOTE)
Cost Sensitive Approach
Bootstrap sampling
Cost-sensitive function in
learning
Additional Method
Testing stage of classification
Imbalanced Learning: Algorithm-Level
Evolutionary FAM and Evolutionary FAMDDA
Fuzzy ARTMAP
with Dynamic Decay
Adjustment (FAMDDA)
Hybrid Genetic
Algorithms
Fuzzy ARTMAP
(FAM)
Multilayer Perceptron
(MLP)
Evolutionary Programming
EPNet
Method A Method B
……
a b
ab
Input vector, a Target vector, b
Map Field
vigilance
ARTa
vigilance
ARTb
vigilance map field
Complement Coding
……
A = ( a, 1 – a )
……
……
B = ( b, 1 – b )
……
abw
Fuzzy ARTMAP (FAM)
(Carpenter et al, 1992)
ARTa ARTb
wa wb
Flow Chart of FAM Learning Process
Initialization FAM Weights
Complement Coding
Input Patterns, a
Choice Function, T
ART Vigilance Test
MAP Field Vigilance Test
pass
Learning
pass
ab
a
fail
fail
Winner
Process: Category selection, test, search
Resonance
Fuzzy ARTMAP with Dynamic Decay
Adjustment Algorithm (FAMDDA)
Cover
Commit
Shrink
• include a new training pattern into existing of FAM
• introduce a new prototype
• if a new pattern is incorrectly classified by an existing prototype of different class, the width of this prototype is reduced to overcome conflict.
Flow Chart of FAMDDA Learning Process
Initialization FAMDDA Weights
Complement Coding
Input Patterns, a
Choice Function, T
ART Vigilance Test
MAP Field Vigilance Test
pass
Learning with
prototypes’ width
adjustment
pass
ab
a
fail
fail
Winner
Process: Category selection, test, search
Resonance
Evolutionary FAMDDA/FAM with Hybrid
Genetic Algorithms (GAs) (Baskar et al,
2001)
Ph
as
e I
GA Search
Search for near-optimum feasible solutions with GA.
Ph
as
e I
I Local Search
Fine-tune the selected feasible solution (phase I). Direct search algorithm to reduce the size of search region.
Network Environment
Evolutionary Environment
GA search
Local search
EPNet (Yao, 1999)
Evolving Feedforward neural network performs adaptation in terms of learning and evolution
Evolve architecture and connection weights using evolutionary programming
5 mutation operators
The Construction of EPNet
Figure adapted from Yao (1999).
Algorithm-Level Classification: Results
•performance comparison with other classification methods
Model CRR TPR TNR G-mean
KNN 91 95 43 64
SVM-RBF 75 74 83 79
EPNet 80.34 80.18 82.30 81.84
FAM-HGA 87.16 87.79 79.42 83.50
FAMDDA-HGA 88.69 89.75 75.73 82.44
Additional Method: Testing Stage of
Classification
Rule-Based Classifiers
FAMDDA-FIM
and
FAM-FIM
Rectangular Basis Function
Network (RecBFN)
NEFCLASS
Note: FIM – Fuzzy Inference Mechanism
FAMDDA-FIM and FAM-FIM
Knowledge base formation from learning with FAMDDA/FAM
Knowledge base extraction from a trained network,
Reasoning process
Ljc jaj ,,2,1 , w
Reasoning Process in a Trained
FAMDDA/FAM
Determining output. By a weighted sum of all rules’ firing strengths.
Aggregating firing strength of all rules. The activation levels of the rules from different classes are aggregated by an additive combination.
Matching degree. Calculate firing strength of the antecedent of each rule, associate directly to its consequent class.
Rectangular Basis Function Network
(RecBFN) (Huber and Berthold, 1995)
A method to learn hyper-rectangles (rules) directly from data.
Applies a constructive learning algorithm (Dynamic Decay Adjustment algorithm)
Hyper-rectangles translated directly as rules.
Rectangular Basis Function Network
(RecBFN)
• The output layer computes a weighted sum of the activations of the RecBF units.
Figure adapted from Berthod and Huber (1995)
Neuro Fuzzy CLASSification (NEFCLASS)
(Nauck and Kruse, 1997)
A neuro-fuzzy classifier: A 3-layer (input/rule/output) fuzzy perceptron + backpropagation algorithm.
Learns the shape of membership functions (fuzzy sets )
Train with prior knowledge/from scratch with data.
NEFCLASS If x1 is µ1 and x2 is µ2
and… and … xn is µn then the pattern (x1, x2, …, xn) belongs to ci
Testing Stage of Classification: Results
Model CRR TPR TNR G-mean
KNN 91 95 43 64
SVM-RBF 75 74 83 79
RecBFN 8.90 86.83 93.85 1.30
NEFCLASS 5.14 4.12 0.81 44.60
FAM-FIM 89.08 90.13 76.20 82.87
FAMDDA-FIM 87.18 87.78 79.78 83.69
0
10
20
30
40
50
60
70
80
90
100
CCR
TPR
TNR
G-Mean
Summary
• Current work: some machine-learning methods based on artificial neural networks, rule-based system and evolutionary algorithms.
• Further improvements in classification performances with other machine-learning methods (algorithm-level approach).
• Future work: boosting algorithm + machine-learning model
References
• Baskar, S., Subraraj, P., and Rao, M. V. C. (2001). Performance of hybrid real coded genetic algorithms. International Journal of Computational Engineering Science, 2, 583-601.
• Berthold, M. R., and Diamond, J. (1998). Constructive training of probabilistic neural network. Neurocomputing, 19, 167-183.
• Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., and Rosen, D. B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analogue multidimensional maps. IEEE Transactions on Neural Networks, 3, 698-713.
• He, H. and Garcia, E.A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263 – 1284.
• Huber, K. -P., and Berthold, M. R. (1995). Building precise classifiers with automatic rule extraction. Proceedings of the IEEE International Conference on Neural Networks, 3, 1263-1268.
• D. Nauck and R. Kruse (1997). A neuro-fuzzy method to learn fuzzy classification rules from data. Fuzzy Sets and Systems, 89, 277–288.
• Yao, X. (1999). Evolving artificial neural networks. Proceedings of IEEE, 87, 1423 – 1447.