Statistical Tools for Multivariate Six Sigma

Post on 06-Feb-2016

36 views 0 download

description

Statistical Tools for Multivariate Six Sigma. Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. Revised talk: www.statgraphics.com\documents.htm. The Challenge. The quality of an item or service usually depends on more than one characteristic. - PowerPoint PPT Presentation

transcript

1

Statistical Tools for Multivariate Six SigmaDr. Neil W. PolhemusCTO & Director of DevelopmentStatPoint, Inc.

Revised talk: www.statgraphics.com\documents.htm

2

The Challenge

The quality of an item or service usually depends on more than one characteristic.

When the characteristics are not independent, considering each characteristic separately can give a misleading estimate of overall performance.

3

The Solution

Proper analysis of data from such processes requires the use of multivariate statistical techniques.

4

Important Tools Statistical Process Control

Multivariate capability analysis Multivariate control charts

Statistical Model Building* Data Mining - dimensionality reduction DOE - multivariate optimization

* Regression and classification.

5

Example #1

Textile fiber

Characteristic #1: tensile strength (115.0 ± 1.0)

Characteristic #2: diameter (1.05 ± 0.01)

6

Individuals ChartsX Chart for strength

0 20 40 60 80 100Observation

114

114.3

114.6

114.9

115.2

115.5

115.8

X

CTR = 114.98UCL = 115.69

LCL = 114.27

X Chart for diameter

0 20 40 60 80 100Observation

1.04

1.043

1.046

1.049

1.052

1.055

1.058

X

CTR = 1.05UCL = 1.06

LCL = 1.04

7

Capability Analysis (each separately)

NormalMean=114.978Std. Dev.=0.238937

Cp = 1.41Pp = 1.40Cpk = 1.38Ppk = 1.36K = -0.02

Process Capability for strength LSL = 114.0, Nominal = 115.0, USL = 116.0

114 114.4 114.8 115.2 115.6 116strength

0

4

8

12

16

20

freq

uenc

y

DPM=30.76 DPM=44.59

NormalMean=1.04991Std. Dev.=0.00244799

Cp = 1.41Pp = 1.36Cpk = 1.39Ppk = 1.35K = -0.01

Process Capability for diameter

LSL = 1.04, Nominal = 1.05, USL = 1.06

1.04 1.044 1.048 1.052 1.056 1.06diameter

0

3

6

9

12

15

freq

uenc

y

8

Scatterplot

Correlation = 0.89

Plot of diameter vs strength

114 114.6 115.2 115.8strength

1.04

1.045

1.05

1.055

1.06

dia

met

er

9

Multivariate Normal Distribution

Multivariate Normal Distribution

114 114.5 115 115.5 116

strength

1.041.045

1.051.055

1.06

diameter

10

Control Ellipse

Control Ellipse

114 114.6 115.2 115.8

strength

1.04

1.045

1.05

1.055

1.06d

iam

eter

11

Multivariate Capability

Multivariate Capability PlotDPM = 70.4091

113.6 114.4115.2 116 116.8

strength

1.0351.045

1.0551.065

diameter

Observed Estimated Variable Beyond Spec. DPM strength 0.0% 30.7572 diameter 0.0% 44.5939 Joint 0.0% 70.4091

Determines joint probability of being within

the specification limits on all characteristics.

12

Mult. Capability Indices

Defined to give the

same DPM as in the

univariate case.

Capability Indices Index Estimate MCP 1.27 MCR 78.81 DPM 70.4091 Z 3.81 SQL 5.31

13

More than 2 Variables

Control Ellipsoid

5.8 7.8 9.8 11.8 13.8 15.8X1

6.18.1

10.112.1

14.1

X2

6.2

8.2

10.2

12.2

14.2

X3

14

Hotelling’s T-Squared

Measures the distance of each point from the centroid of the data (or the assumed distribution).

)()( 12 xxSxxT iii

15

T-Squared Chart

Multivariate Control Chart

UCL = 11.25

0 20 40 60 80 100 120Observation

0

5

10

15

20

25

30

T-S

qu

ared

16

T-Squared Decomposition

T-Squared Decomposition Relative Contribution to T-Squared Signal Observation T-Squared X1 X2 X3 17 13.8371 4.54101 0.340022 8.35196 The StatAdvisor This table decomposes the out-of-control signals on the T-Squared chart. It calculates the relative importance of each variable to the signal by subtracting the value of T-Squared calculated without using that variable from the full T-Squared value. Examine each row closely to determine which variable (or variables) are likely causing that signal.

17

Statistical Model Building Defining relationships (regression and ANOVA) Classifying items Detecting unusual events Optimizing processes

When the response variables are correlated, it is important to consider the responses together.

When the number of variables is large, the dimensionality of the problem often makes it difficult to determine the underlying relationships.

18

Example #2

19

Matrix Plot

MPG City

MPG Highway

Engine Size

Horsepower

Length

Passengers

U Turn Space

Weight

Wheelbase

Width

20

Multiple RegressionMPG City = 29.6315 + 0.28816*Engine Size - 0.00688362*Horsepower - 0.0365723*Length - 0.297446*Passengers - 0.139763*U Turn Space - 0.00984486*Weight + 0.280224*Wheelbase + 0.111526*Width

Standard T Parameter Estimate Error Statistic P-Value CONSTANT 29.6315 12.9763 2.28351 0.0249 Engine Size 0.28816 0.722918 0.398607 0.6912 Horsepower -0.00688362 0.0134153 -0.513119 0.6092 Length -0.0365723 0.0447211 -0.817786 0.4158 Passengers -0.297446 0.54754 -0.543241 0.5884 U Turn Space -0.139763 0.17926 -0.779668 0.4378 Weight -0.00984486 0.00192619 -5.11104 0.0000 Wheelbase 0.280224 0.124837 2.24472 0.0274 Width 0.111526 0.218893 0.5095 0.6117

21

Reduced Models

MPG City = 29.9911 - 0.0103886*Weight + 0.233751*Wheelbase (R2=73.0%)

MPG City = 64.1402 - 0.054462*Horsepower - 1.56144*Passengers - 0.374767*Width (R2=64.3%)

22

Dimensionality Reduction

Construction of linear combinations of the variables can often provide important insights.

Principal components analysis (PCA) and principal components regression (PCR): constructs linear combinations of the predictor variables X that contain the greatest variance and then uses those to predict the responses.

Partial least squares (PLS): finds components that minimize the variance in both the X’s and the Y’s simultaneously.

23

Principal Components Analysis

pp XaXaXaC 12121111 ...

Principal Components Analysis Component Percent of Cumulative Number Eigenvalue Variance Percentage 1 5.8263 72.829 72.829 2 1.09626 13.703 86.532 3 0.339796 4.247 90.779 4 0.270321 3.379 94.158 5 0.179286 2.241 96.400 6 0.12342 1.543 97.942 7 0.109412 1.368 99.310 8 0.0552072 0.690 100.000

24

Scree Plot

Scree Plot

0 2 4 6 8

Component

0

1

2

3

4

5

6

Eig

enva

lue

25

Component Weights

C1 = 0.377*Engine Size + 0.292*Horsepower + 0.239*Passengers + 0.370*Length + 0.375*Wheelbase + 0.389*Width + 0.360*U Turn Space + 0.396*Weight

C2 = -0.205*Engine Size – 0.593*Horsepower + 0.731*Passengers + 0.043*Length + 0.260*Wheelbase – 0.042*Width – 0.026*U Turn Space – 0.030*Weight

26

Interpretation

Biplot

-6 -4 -2 0 2 4 6

Component 1

-5

-3

-1

1

3

5

7

Co

mp

on

en

t 2

Engine Size

Horsepower

Passengers

Length

Wheelbase

WidthU Turn SpaceWeight

27

PC Regression

Estimated Response Surface

-6 -4 -2 0 2 4 6C1

-5-3

-11

3

C20

10

20

30

40

50

60

MP

G C

ity

MPG City0.05.010.015.020.025.030.035.040.045.050.055.0

28

Contour Plot

Contours of Estimated Response Surface

-6 -4 -2 0 2 4 6

C1

-5

-3

-1

1

3

C2

MPG City10.015.020.025.030.035.040.045.0

29

PLS Model Selection

Model Comparison Plot

1 2 3 4 5 6 7 8Number of components

0

20

40

60

80

100

Per

cen

t va

riat

ion

XY

30

PLS Coefficients

Selecting to extract 3 components:Standardized Coefficients MPG City MPG Highway Constant 0.0 0.0 Engine Size -0.0375246 0.0659656 Horsepower -0.329264 -0.39319 Length 0.0802132 0.22243 Passengers -0.178438 -0.331005 U Turn Space -0.0484675 -0.00202398 Weight -0.428481 -0.642872 Wheelbase -0.0149712 0.0592427 Width -0.0320902 0.0532588

Unstandardized Coefficients MPG City MPG Highway Constant 47.6716 35.6569 Engine Size -0.203286 0.339043 Horsepower -0.0353303 -0.0400268 Length 0.0308705 0.0812151 Passengers -0.965169 -1.69862 U Turn Space -0.0845038 -0.00334794 Weight -0.00408204 -0.00581054 Wheelbase -0.0123371 0.0463168 Width -0.0477221 0.0751422

31

Interpretation

Plot of unsportiness vs size

-6 -4 -2 0 2 4 6

size

-5

-3

-1

1

3

un

spo

rtin

ess

TypeCompactLarge MidsizeSmall Sporty Van

32

Neural Networks

33

Bayesian Classifier

(2 variables) (93 cases) (6 neurons)

Input layer Pattern layer Summation layer Output layer

(6 groups)

34

Classification

sigma = 0.3

Classification Plot

-6 -4 -2 0 2 4 6

C1

-5

-3

-1

1

3

C2

TypeCompact Large Midsize Small Sporty Van

35

Design of Experiments

When more than one characteristic is important, finding the optimal operating conditions usually requires a tradeoff of one characteristic for another.

One approach to finding a single solution is to use desirability functions.

36

Example #3

Myers and Montgomery (2002) describe an experiment on a chemical process (20-run central composite design):

Response variable Goal

Conversion percentage maximize

Thermal activity Maintain between 55 and 60

Input factor Low High

time 8 minutes 17 minutes

temperature 160˚ C 210˚ C

catalyst 1.5% 3.5%

37

Optimize ConversionGoal: maximize conversion Optimum value = 118.174 Factor Low High Optimum time 8.0 17.0 17.0 temperature 160.0 210.0 210.0 catalyst 1.5 3.5 3.48086

Contours of Estimated Response Surfacetemperature=210.0

8 9 10 11 12 13 14 15 16 17

time

1.5

2

2.5

3

3.5

cata

lyst

conversion70.072.575.077.580.082.585.087.590.092.595.097.5100.0

38

Optimize ActivityGoal: maintain activity at 57.5 Optimum value = 57.5 Factor Low High Optimum time 8.3 16.7 10.297 temperature 209.99 210.01 210.004 catalyst 1.66 3.35 2.31021

Contours of Estimated Response Surface

temperature=210.0

8 9 10 11 12 13 14 15 16 17

time

1.5

2

2.5

3

3.5

cata

lyst

activity55.056.057.058.059.060.0

39

Desirability Functions

Maximization

Desirability Function for Maximization

Predicted response

Desir

abili

ty, d

s = 1s = 2

s = 8

s = 0.4

s = 0.2

Low

0 20 40 60 80 100

0

0.2

0.4

0.6

0.8

1

High

40

Desirability Functions

Hit a target

Desirability Function for Hitting Target

Predicted response

Desir

abili

ty, d

Low HighTarget

s = 1 t = 1

s = 0.1 t = 0.1

s = 5

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

t = 5

41

Combined Desirability

di = desirability of i-th response given the settings of the m experimental factors X.

D ranges from 0 (least desirable) to 1 (most desirable).

m

jjm

IIm

II dddXD 121

/1

21 ...)(

42

Desirability ContoursMax D=0.959 at time=11.14, temperature=210.0, and catalyst = 2.20.

Contours of Estimated Response Surfacetemperature=210.0

8 9 10 11 12 13 14 15 16 17

time

1.5

2

2.5

3

3.5

cata

lyst

Desirability0.00.10.20.30.40.50.60.70.80.91.0

43

Desirability Surface

Estimated Response Surfacetemperature=210.0

8 9 10 11 12 13 14 15 16 17time

1.52

2.53

3.5

catalyst

0

0.2

0.4

0.6

0.8

1

Des

irab

ility

44

References Johnson, R.A. and Wichern, D.W. (2002). Applied Multivariate

Statistical Analysis. Upper Saddle River: Prentice Hall.Mason, R.L. and Young, J.C. (2002).

Mason and Young (2002). Multivariate Statistical Process Control with Industrial Applications. Philadelphia: SIAM.

Montgomery, D. C. (2005). Introduction to Statistical Quality Control, 5th edition. New York: John Wiley and Sons.

Myers, R. H. and Montgomery, D. C. (2002). Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 2nd edition. New York: John Wiley and Sons.

Revised talk: www.statgraphics.com\documents.htm