1
Statistical Tools for Multivariate Six SigmaDr. Neil W. PolhemusCTO & Director of DevelopmentStatPoint, Inc.
Revised talk: www.statgraphics.com\documents.htm
2
The Challenge
The quality of an item or service usually depends on more than one characteristic.
When the characteristics are not independent, considering each characteristic separately can give a misleading estimate of overall performance.
3
The Solution
Proper analysis of data from such processes requires the use of multivariate statistical techniques.
4
Important Tools Statistical Process Control
Multivariate capability analysis Multivariate control charts
Statistical Model Building* Data Mining - dimensionality reduction DOE - multivariate optimization
* Regression and classification.
5
Example #1
Textile fiber
Characteristic #1: tensile strength (115.0 ± 1.0)
Characteristic #2: diameter (1.05 ± 0.01)
6
Individuals ChartsX Chart for strength
0 20 40 60 80 100Observation
114
114.3
114.6
114.9
115.2
115.5
115.8
X
CTR = 114.98UCL = 115.69
LCL = 114.27
X Chart for diameter
0 20 40 60 80 100Observation
1.04
1.043
1.046
1.049
1.052
1.055
1.058
X
CTR = 1.05UCL = 1.06
LCL = 1.04
7
Capability Analysis (each separately)
NormalMean=114.978Std. Dev.=0.238937
Cp = 1.41Pp = 1.40Cpk = 1.38Ppk = 1.36K = -0.02
Process Capability for strength LSL = 114.0, Nominal = 115.0, USL = 116.0
114 114.4 114.8 115.2 115.6 116strength
0
4
8
12
16
20
freq
uenc
y
DPM=30.76 DPM=44.59
NormalMean=1.04991Std. Dev.=0.00244799
Cp = 1.41Pp = 1.36Cpk = 1.39Ppk = 1.35K = -0.01
Process Capability for diameter
LSL = 1.04, Nominal = 1.05, USL = 1.06
1.04 1.044 1.048 1.052 1.056 1.06diameter
0
3
6
9
12
15
freq
uenc
y
8
Scatterplot
Correlation = 0.89
Plot of diameter vs strength
114 114.6 115.2 115.8strength
1.04
1.045
1.05
1.055
1.06
dia
met
er
9
Multivariate Normal Distribution
Multivariate Normal Distribution
114 114.5 115 115.5 116
strength
1.041.045
1.051.055
1.06
diameter
10
Control Ellipse
Control Ellipse
114 114.6 115.2 115.8
strength
1.04
1.045
1.05
1.055
1.06d
iam
eter
11
Multivariate Capability
Multivariate Capability PlotDPM = 70.4091
113.6 114.4115.2 116 116.8
strength
1.0351.045
1.0551.065
diameter
Observed Estimated Variable Beyond Spec. DPM strength 0.0% 30.7572 diameter 0.0% 44.5939 Joint 0.0% 70.4091
Determines joint probability of being within
the specification limits on all characteristics.
12
Mult. Capability Indices
Defined to give the
same DPM as in the
univariate case.
Capability Indices Index Estimate MCP 1.27 MCR 78.81 DPM 70.4091 Z 3.81 SQL 5.31
13
More than 2 Variables
Control Ellipsoid
5.8 7.8 9.8 11.8 13.8 15.8X1
6.18.1
10.112.1
14.1
X2
6.2
8.2
10.2
12.2
14.2
X3
14
Hotelling’s T-Squared
Measures the distance of each point from the centroid of the data (or the assumed distribution).
)()( 12 xxSxxT iii
15
T-Squared Chart
Multivariate Control Chart
UCL = 11.25
0 20 40 60 80 100 120Observation
0
5
10
15
20
25
30
T-S
qu
ared
16
T-Squared Decomposition
T-Squared Decomposition Relative Contribution to T-Squared Signal Observation T-Squared X1 X2 X3 17 13.8371 4.54101 0.340022 8.35196 The StatAdvisor This table decomposes the out-of-control signals on the T-Squared chart. It calculates the relative importance of each variable to the signal by subtracting the value of T-Squared calculated without using that variable from the full T-Squared value. Examine each row closely to determine which variable (or variables) are likely causing that signal.
17
Statistical Model Building Defining relationships (regression and ANOVA) Classifying items Detecting unusual events Optimizing processes
When the response variables are correlated, it is important to consider the responses together.
When the number of variables is large, the dimensionality of the problem often makes it difficult to determine the underlying relationships.
18
Example #2
19
Matrix Plot
MPG City
MPG Highway
Engine Size
Horsepower
Length
Passengers
U Turn Space
Weight
Wheelbase
Width
20
Multiple RegressionMPG City = 29.6315 + 0.28816*Engine Size - 0.00688362*Horsepower - 0.0365723*Length - 0.297446*Passengers - 0.139763*U Turn Space - 0.00984486*Weight + 0.280224*Wheelbase + 0.111526*Width
Standard T Parameter Estimate Error Statistic P-Value CONSTANT 29.6315 12.9763 2.28351 0.0249 Engine Size 0.28816 0.722918 0.398607 0.6912 Horsepower -0.00688362 0.0134153 -0.513119 0.6092 Length -0.0365723 0.0447211 -0.817786 0.4158 Passengers -0.297446 0.54754 -0.543241 0.5884 U Turn Space -0.139763 0.17926 -0.779668 0.4378 Weight -0.00984486 0.00192619 -5.11104 0.0000 Wheelbase 0.280224 0.124837 2.24472 0.0274 Width 0.111526 0.218893 0.5095 0.6117
21
Reduced Models
MPG City = 29.9911 - 0.0103886*Weight + 0.233751*Wheelbase (R2=73.0%)
MPG City = 64.1402 - 0.054462*Horsepower - 1.56144*Passengers - 0.374767*Width (R2=64.3%)
22
Dimensionality Reduction
Construction of linear combinations of the variables can often provide important insights.
Principal components analysis (PCA) and principal components regression (PCR): constructs linear combinations of the predictor variables X that contain the greatest variance and then uses those to predict the responses.
Partial least squares (PLS): finds components that minimize the variance in both the X’s and the Y’s simultaneously.
23
Principal Components Analysis
pp XaXaXaC 12121111 ...
Principal Components Analysis Component Percent of Cumulative Number Eigenvalue Variance Percentage 1 5.8263 72.829 72.829 2 1.09626 13.703 86.532 3 0.339796 4.247 90.779 4 0.270321 3.379 94.158 5 0.179286 2.241 96.400 6 0.12342 1.543 97.942 7 0.109412 1.368 99.310 8 0.0552072 0.690 100.000
24
Scree Plot
Scree Plot
0 2 4 6 8
Component
0
1
2
3
4
5
6
Eig
enva
lue
25
Component Weights
C1 = 0.377*Engine Size + 0.292*Horsepower + 0.239*Passengers + 0.370*Length + 0.375*Wheelbase + 0.389*Width + 0.360*U Turn Space + 0.396*Weight
C2 = -0.205*Engine Size – 0.593*Horsepower + 0.731*Passengers + 0.043*Length + 0.260*Wheelbase – 0.042*Width – 0.026*U Turn Space – 0.030*Weight
26
Interpretation
Biplot
-6 -4 -2 0 2 4 6
Component 1
-5
-3
-1
1
3
5
7
Co
mp
on
en
t 2
Engine Size
Horsepower
Passengers
Length
Wheelbase
WidthU Turn SpaceWeight
27
PC Regression
Estimated Response Surface
-6 -4 -2 0 2 4 6C1
-5-3
-11
3
C20
10
20
30
40
50
60
MP
G C
ity
MPG City0.05.010.015.020.025.030.035.040.045.050.055.0
28
Contour Plot
Contours of Estimated Response Surface
-6 -4 -2 0 2 4 6
C1
-5
-3
-1
1
3
C2
MPG City10.015.020.025.030.035.040.045.0
29
PLS Model Selection
Model Comparison Plot
1 2 3 4 5 6 7 8Number of components
0
20
40
60
80
100
Per
cen
t va
riat
ion
XY
30
PLS Coefficients
Selecting to extract 3 components:Standardized Coefficients MPG City MPG Highway Constant 0.0 0.0 Engine Size -0.0375246 0.0659656 Horsepower -0.329264 -0.39319 Length 0.0802132 0.22243 Passengers -0.178438 -0.331005 U Turn Space -0.0484675 -0.00202398 Weight -0.428481 -0.642872 Wheelbase -0.0149712 0.0592427 Width -0.0320902 0.0532588
Unstandardized Coefficients MPG City MPG Highway Constant 47.6716 35.6569 Engine Size -0.203286 0.339043 Horsepower -0.0353303 -0.0400268 Length 0.0308705 0.0812151 Passengers -0.965169 -1.69862 U Turn Space -0.0845038 -0.00334794 Weight -0.00408204 -0.00581054 Wheelbase -0.0123371 0.0463168 Width -0.0477221 0.0751422
31
Interpretation
Plot of unsportiness vs size
-6 -4 -2 0 2 4 6
size
-5
-3
-1
1
3
un
spo
rtin
ess
TypeCompactLarge MidsizeSmall Sporty Van
32
Neural Networks
33
Bayesian Classifier
(2 variables) (93 cases) (6 neurons)
Input layer Pattern layer Summation layer Output layer
(6 groups)
34
Classification
sigma = 0.3
Classification Plot
-6 -4 -2 0 2 4 6
C1
-5
-3
-1
1
3
C2
TypeCompact Large Midsize Small Sporty Van
35
Design of Experiments
When more than one characteristic is important, finding the optimal operating conditions usually requires a tradeoff of one characteristic for another.
One approach to finding a single solution is to use desirability functions.
36
Example #3
Myers and Montgomery (2002) describe an experiment on a chemical process (20-run central composite design):
Response variable Goal
Conversion percentage maximize
Thermal activity Maintain between 55 and 60
Input factor Low High
time 8 minutes 17 minutes
temperature 160˚ C 210˚ C
catalyst 1.5% 3.5%
37
Optimize ConversionGoal: maximize conversion Optimum value = 118.174 Factor Low High Optimum time 8.0 17.0 17.0 temperature 160.0 210.0 210.0 catalyst 1.5 3.5 3.48086
Contours of Estimated Response Surfacetemperature=210.0
8 9 10 11 12 13 14 15 16 17
time
1.5
2
2.5
3
3.5
cata
lyst
conversion70.072.575.077.580.082.585.087.590.092.595.097.5100.0
38
Optimize ActivityGoal: maintain activity at 57.5 Optimum value = 57.5 Factor Low High Optimum time 8.3 16.7 10.297 temperature 209.99 210.01 210.004 catalyst 1.66 3.35 2.31021
Contours of Estimated Response Surface
temperature=210.0
8 9 10 11 12 13 14 15 16 17
time
1.5
2
2.5
3
3.5
cata
lyst
activity55.056.057.058.059.060.0
39
Desirability Functions
Maximization
Desirability Function for Maximization
Predicted response
Desir
abili
ty, d
s = 1s = 2
s = 8
s = 0.4
s = 0.2
Low
0 20 40 60 80 100
0
0.2
0.4
0.6
0.8
1
High
40
Desirability Functions
Hit a target
Desirability Function for Hitting Target
Predicted response
Desir
abili
ty, d
Low HighTarget
s = 1 t = 1
s = 0.1 t = 0.1
s = 5
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
t = 5
41
Combined Desirability
di = desirability of i-th response given the settings of the m experimental factors X.
D ranges from 0 (least desirable) to 1 (most desirable).
m
jjm
IIm
II dddXD 121
/1
21 ...)(
42
Desirability ContoursMax D=0.959 at time=11.14, temperature=210.0, and catalyst = 2.20.
Contours of Estimated Response Surfacetemperature=210.0
8 9 10 11 12 13 14 15 16 17
time
1.5
2
2.5
3
3.5
cata
lyst
Desirability0.00.10.20.30.40.50.60.70.80.91.0
43
Desirability Surface
Estimated Response Surfacetemperature=210.0
8 9 10 11 12 13 14 15 16 17time
1.52
2.53
3.5
catalyst
0
0.2
0.4
0.6
0.8
1
Des
irab
ility
44
References Johnson, R.A. and Wichern, D.W. (2002). Applied Multivariate
Statistical Analysis. Upper Saddle River: Prentice Hall.Mason, R.L. and Young, J.C. (2002).
Mason and Young (2002). Multivariate Statistical Process Control with Industrial Applications. Philadelphia: SIAM.
Montgomery, D. C. (2005). Introduction to Statistical Quality Control, 5th edition. New York: John Wiley and Sons.
Myers, R. H. and Montgomery, D. C. (2002). Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 2nd edition. New York: John Wiley and Sons.
Revised talk: www.statgraphics.com\documents.htm