Design and Analysis of Experiments
Dr. Tai-Yue Wang Department of Industrial and Information Management
National Cheng Kung UniversityTainan, TAIWAN, ROC
1/33
Two-Level Factorial Designs
Dr. Tai-Yue Wang Department of Industrial and Information Management
National Cheng Kung UniversityTainan, TAIWAN, ROC
2/33
Outline Introduction The 22 Design The 23 Design The general 2k Design A single Replicate of the 2k design Additional Examples of Unreplicated 2k
Designs 2k Designs are Optimal Designs The additional of center Point to the 2k Design
Introduction Special case of general factorial designs k factors each with two levels Factors maybe qualitative or quantitative A complete replicate of such design is 2k
factorial design Assumed factors are fixed, the design are
completely randomized, and normality Used as factor screening experiments Response between levels is assumed linear
The 22 Design
Factor Treatment Combination
Replication
A B I II III IV
- - A low, B low 28 25 27 80
+ - A high, B low 36 32 32 100
- + A low, B high 18 19 23 60
+ + A high, B high 31 30 29 90
The 22 Design
“-” and “+” denote the low and high levels of a factor, respectively
Low and high are arbitrary terms
Geometrically, the four runs form the corners of a square
Factors can be quantitative or qualitative, although their treatment in the final model will be different
Estimate factor effects Formulate model
With replication, use full model With an unreplicated design, use normal probability plots
Statistical testing (ANOVA) Refine the model Analyze residuals (graphical) Interpret results
The 22 Design
12
12
12
(1)2 2[ (1)]
(1)2 2[ (1)]
(1)2 2[ (1) ]
A A
n
B B
n
n
A y y
ab a bn nab a b
B y y
ab b an nab b a
ab a bABn n
ab a b
ABBATE
i j
n
kijkT
AB
B
A
SSSSSSSSSSn
yySS
nbaabSS
nbaabSS
nbaabSS
2
1
2
1 1
2...2
2
2
2
4
4)]1([
4)]1([
4)]1([
The 22 Design
The 22 Design Standard order Yates’s order
Effects (1) a b ab
A -1 +1 -1 +1
B -1 -1 +1 +1
AB +1 -1 -1 +1
Effects A, B, AB are orthogonal contrasts with one degree of freedom
Thus 2k designs are orthogonal designs
The 22 Design ANOVA table
The 22 Design Algebraic sign for calculating effects in 22 design
The 22 Design Regression model
x1 and x2 are code variable in this case
Where con and catalyst are natural variables
22110 xxy
2/)(2/)(
2/)(2/)(
2
1
lowhigh
highlow
lowhigh
highlow
catalystcatalystcatalystcatalystcatalyst
x
conconconconcon
x
The 22 Design Regression model
Factorial Fit: Yield versus Conc., Catalyst Estimated Effects and Coefficients for Yield (coded units)Term Effect Coef SE Coef T PConstant 27.500 0.5713 48.14 0.000Conc. 8.333 4.167 0.5713 7.29 0.000Catalyst -5.000 -2.500 0.5713 -4.38 0.002Conc.*Catalyst 1.667 0.833 0.5713 1.46 0.183
S = 1.97906 PRESS = 70.5R-Sq = 90.30% R-Sq(pred) = 78.17% R-Sq(adj) = 86.66%
Analysis of Variance for Yield (coded units)
Source DF Seq SS Adj SS Adj MS F PMain Effects 2 283.333 283.333 141.667 36.17 0.0002-Way Interactions 1 8.333 8.333 8.333 2.13 0.183Residual Error 8 31.333 31.333 3.917 Pure Error 8 31.333 31.333 3.917Total 11 323.000
The 22 Design Regression model
The 22 Design Regression model
The 22 Design Regression model
Estimated Coefficients for Yield using data in uncoded units
Term CoefConstant 28.3333Conc. 0.333333Catalyst -11.6667Conc.*Catalyst 0.333333
Estimated Coefficients for Yield using data in uncoded units
Term CoefConstant 18.3333Conc. 0.833333Catalyst -5.00000
Regression model (without interaction)
The 22 Design Response surface
The 22 Design Response surface (note: the axis of catalyst is
reversed with the one from textbook)
The 23 Design 3 factors, each at two level. Eight combinations
The 23 Design Design matrix Or geometric notation
The 23 Design Algebraic sign
22
The 23 Design -- Properties of the Table
Except for column I, every column has an equal number of + and – signs
The sum of the product of signs in any two columns is zero
Multiplying any column by I leaves that column unchanged (identity element)
23
The 23 Design -- Properties of the Table
The product of any two columns yields a column in the table:
Orthogonal design Orthogonality is an important property shared by
all factorial designs
2
A B AB
AB BC AB C AC
The 23 Design -- example Nitride etch process Gap, gas flow, and RF power
The 23 Design -- example Nitride etch process Gap, gas flow, and RF power
The 23 Design -- example
Estimated Effects and Coefficients for Etch Rate (coded units)Term Effect Coef SE Coef T PConstant 776.06 11.87 65.41 0.000Gap -101.62 -50.81 11.87 -4.28 0.003Gas Flow 7.37 3.69 11.87 0.31 0.764Power 306.12 153.06 11.87 12.90 0.000Gap*Gas Flow -24.88 -12.44 11.87 -1.05 0.325Gap*Power -153.63 -76.81 11.87 -6.47 0.000Gas Flow*Power -2.12 -1.06 11.87 -0.09 0.931Gap*Gas Flow*Power 5.62 2.81 11.87 0.24 0.819
S = 47.4612 PRESS = 72082R-Sq = 96.61% R-Sq(pred) = 86.44% R-Sq(adj) = 93.64%
Analysis of Variance for Etch Rate (coded units)
Source DF Seq SS Adj SS Adj MS F PMain Effects 3 416378 416378 138793 61.62 0.0002-Way Interactions 3 96896 96896 32299 14.34 0.0013-Way Interactions 1 127 127 127 0.06 0.819Residual Error 8 18020 18020 2253 Pure Error 8 18021 18021 2253Total 15 531421
Full model
The 23 Design -- example
Factorial Fit: Etch Rate versus Gap, Power Estimated Effects and Coefficients for Etch Rate (coded units)Term Effect Coef SE Coef T PConstant 776.06 10.42 74.46 0.000Gap -101.62 -50.81 10.42 -4.88 0.000Power 306.12 153.06 10.42 14.69 0.000Gap*Power -153.63 -76.81 10.42 -7.37 0.000
S = 41.6911 PRESS = 37080.4R-Sq = 96.08% R-Sq(pred) = 93.02% R-Sq(adj) = 95.09%
Analysis of Variance for Etch Rate (coded units)
Source DF Seq SS Adj SS Adj MS F PMain Effects 2 416161 416161 208080 119.71 0.0002-Way Interactions 1 94403 94403 94403 54.31 0.000Residual Error 12 20858 20858 1738 Pure Error 12 20858 20858 1738Total 15 531421
Reduced model
28
R2 and adjusted R2
R2 for prediction (based on PRESS)
52
5
25
5.106 10 0.96085.314 10
/ 20857.75 /121 1 0.9509/ 5.314 10 /15
Model
T
E EAdj
T T
SSR
SSSS dfRSS df
2Pred 5
37080.441 1 0.93025.314 10T
PRESSRSS
The 23 Design – example -- Model Summary Statistics for Reduced Model
The 23 Design -- example
The 23 Design -- example
31
The Regression Model
32
Cube Plot of Ranges
What do the large ranges
when gap and power are at the high level tell
you?
33
The General 2k Factorial Design
There will be k main effects, and
two-factor interactions2
three-factor interactions3
1 factor interaction
k
k
k
34
The General 2k Factorial Design Statistical Analysis
35
The General 2k Factorial Design
Statistical Analysis
36
Unreplicated 2k Factorial Designs
These are 2k factorial designs with one observation at each corner of the “cube”
An unreplicated 2k factorial design is also sometimes called a “single replicate” of the 2k
These designs are very widely used Risks…if there is only one observation at each
corner, is there a chance of unusual response observations spoiling the results?
Modeling “noise”?
37
If the factors are spaced too closely, it increases the chances that the noise will overwhelm the signal in the data
More aggressive spacing is usually best
Unreplicated 2k Factorial Designs
38
Lack of replication causes potential problems in statistical testing Replication admits an estimate of “pure error” (a
better phrase is an internal estimate of error) With no replication, fitting the full model results
in zero degrees of freedom for error Potential solutions to this problem
Pooling high-order interactions to estimate error Normal probability plotting of effects (Daniels,
1959)
Unreplicated 2k Factorial Designs
39
A 24 factorial was used to investigate the effects of four factors on the filtration rate of a resin
The factors are A = temperature, B = pressure, C = mole ratio, D= stirring rate
Experiment was performed in a pilot plant
Unreplicated 2k Factorial Designs -- example
40
Unreplicated 2k Factorial Designs -- example
41
Unreplicated 2k Factorial Designs -- example
42
Unreplicated 2k Factorial Designs – example –full model
43
Unreplicated 2k Factorial Designs -- example –full model
44
Unreplicated 2k Factorial Designs -- example –full model
45
Unreplicated 2k Factorial Designs -- example –reduced model
Factorial Fit: Filtration versus Temperature, Conc., Stir Rate Estimated Effects and Coefficients for Filtration (coded units)Term Effect Coef SE Coef T PConstant 70.063 1.104 63.44 0.000Temperature 21.625 10.812 1.104 9.79 0.000Conc. 9.875 4.938 1.104 4.47 0.001Stir Rate 14.625 7.312 1.104 6.62 0.000Temperature*Conc. -18.125 -9.062 1.104 -8.21 0.000Temperature*Stir Rate 16.625 8.313 1.104 7.53 0.000
S = 4.41730 PRESS = 499.52R-Sq = 96.60% R-Sq(pred) = 91.28% R-Sq(adj) = 94.89%
Analysis of Variance for Filtration (coded units)Source DF Seq SS Adj SS Adj MS F PMain Effects 3 3116.19 3116.19 1038.73 53.23 0.0002-Way Interactions 2 2419.62 2419.62 1209.81 62.00 0.000Residual Error 10 195.12 195.12 19.51 Lack of Fit 2 15.62 15.62 7.81 0.35 0.716 Pure Error 8 179.50 179.50 22.44Total 15 5730.94
46
Unreplicated 2k Factorial Designs -- example –reduced model
47
Unreplicated 2k Factorial Designs -- example –reduced model
48
Unreplicated 2k Factorial Designs -- example –reduced model
49
Unreplicated 2k Factorial Designs -- example –Design projection
Since factor B is negligible, the experiment can be interpreted as a 23 factorial design with factors A, C, D.
2 replicates
50
Unreplicated 2k Factorial Designs -- example –Design projection
51
Unreplicated 2k Factorial Designs -- example –Design projection
Factorial Fit: Filtration versus Temperature, Conc., Stir Rate Estimated Effects and Coefficients for Filtration (coded units)Term Effect Coef SE Coef T PConstant 70.063 1.184 59.16 0.000Temperature 21.625 10.812 1.184 9.13 0.000Conc. 9.875 4.938 1.184 4.17 0.003Stir Rate 14.625 7.312 1.184 6.18 0.000Temperature*Conc. -18.125 -9.062 1.184 -7.65 0.000Temperature*Stir Rate 16.625 8.313 1.184 7.02 0.000Conc.*Stir Rate -1.125 -0.562 1.184 -0.48 0.647Temperature*Conc.*Stir Rate -1.625 -0.813 1.184 -0.69 0.512
S = 4.73682 PRESS = 718R-Sq = 96.87% R-Sq(pred) = 87.47% R-Sq(adj) = 94.13%
Analysis of Variance for Filtration (coded units)Source DF Seq SS Adj SS Adj MS F PMain Effects 3 3116.19 3116.19 1038.73 46.29 0.0002-Way Interactions 3 2424.69 2424.69 808.23 36.02 0.0003-Way Interactions 1 10.56 10.56 10.56 0.47 0.512Residual Error 8 179.50 179.50 22.44 Pure Error 8 179.50 179.50 22.44Total 15 5730.94
52
Dealing with Outliers Replace with an estimate Make the highest-order interaction zero In this case, estimate cd such that ABCD =
0 Analyze only the data you have Now the design isn’t orthogonal Consequences?
53
Duplicate Measurements on the Response
Four wafers are stacked in the furnace Four factors: temperature, time, gas flow, and
pressure. Response: thickness Treated as duplicate not replicate Use average as the response
54
Duplicate Measurements on the Response
55
Duplicate Measurements on the Response
Stat DOE Factorial Pre-process Response for Analyze
56
Duplicate Measurements on the Response
Stat DOE Factorial Analyze Factorial Design
57
Duplicate Measurements on the Response
Factorial Fit: average versus Temperature, Time, Pressure Estimated Effects and Coefficients for average (coded units)Term Effect Coef SE Coef T PConstant 399.188 1.049 380.48 0.000Temperature 43.125 21.562 1.049 20.55 0.000Time 18.125 9.062 1.049 8.64 0.000Pressure -10.375 -5.187 1.049 -4.94 0.001Temperature*Time 16.875 8.438 1.049 8.04 0.000Temperature*Pressure -10.625 -5.312 1.049 -5.06 0.000
S = 4.19672 PRESS = 450.88R-Sq = 98.39% R-Sq(pred) = 95.88% R-Sq(adj) = 97.59%
Analysis of Variance for average (coded units)Source DF Seq SS Adj SS Adj MS F PMain Effects 3 9183.7 9183.69 3061.23 173.81 0.0002-Way Interactions 2 1590.6 1590.62 795.31 45.16 0.000Residual Error 10 176.1 176.12 17.61 Lack of Fit 2 60.6 60.62 30.31 2.10 0.185 Pure Error 8 115.5 115.50 14.44Total 15 10950.4
58
Duplicate Measurements on the Response
59
Duplicate Measurements on the Response
60
The 2k design and design optimality
The model parameter estimates in a 2k design (and the effect estimates) are least squares estimates. For example, for a 22 design the model is
211222110 xxxxy
61
The four observations from a 22 design
The 2k design and design optimality
412210
312210
212210
112210
)1)(1()1()1()1)(1()1()1()1)(1()1()1(
)1)(1()1()1()1(
abba
In matrix form: XY
62
The matrix is diagonal – consequences of an orthogonal design
X X
The regression coefficient estimates are exactly half of the ‘usual” effect estimates
The “usual” contrasts
The 2k design and design optimality
YXXX '1' )(
1
0
14
2
12
ˆ
4 0 0 0 (1)0 4 0 0 (1)0 0 4 0 (1)0 0 0 4 (1)
(1)4ˆ (1) (
ˆ (1)1ˆ (1)4
(1)ˆ
a b aba ab bb ab a
a b ab
a b ab
a b ab a ab ba ab bb ab a
a b ab
-1β = (X X) X y
I
1)4
(1)4
(1)4
b ab a
a b ab
63
The 2k design and design optimality
The matrix X’X has interesting and useful properties:
2 1
2
ˆ( ) (diagonal element of ( ) )
4
V
X X
|( ) | 256 X X
Minimum possible value for a four-run
designMaximum possible value for a four-run
design
Notice that these results depend on both the design that you have chosen and the model
The 2k design and design optimality
The 22 design is called D-optimal design In fact, all 2k design is D-optimal design for
fitting first order model with interaction. Consider the variance of the predicted
response in the 22 design:
The 2k design and design optimality
21 2
1 2 1 2
22 2 2 2
1 2 1 2 1 2
1 2
21 2
1 2
2
1 2
ˆ[ ( , )][1, , , ]
ˆ[ ( , )] (1 )4
The maximum prediction variance occurs when 1, 1ˆ[ ( , )]
The prediction variance when 0 is
ˆ[ ( , )]
V y x xx x x x
V y x x x x x x
x x
V y x xx x
V y x x
-1x (X X) xx
4What about prediction variance over the design space?average
The 2k design and design optimality
1 12
1 2 1 21 1
1 12 2 2 2 2
1 2 1 2 1 21 1
2
1 ˆ[ ( , ) = area of design space = 2 4
1 1 (1 ) 4 4
49
I V y x x dx dx AA
x x x x dx dx
The 22 design is called G-optimal design In fact, all 2k design is G-optimal design for
fitting first order model with interaction.
Minimize the maximum prediction variance
The 2k design and design optimality
The 22 design is called I-optimal design In fact, all 2k design is I-optimal design for
fitting first order model with interaction.
Smallest possible value of the average prediction variance
The 2k design and design optimality
The Minitab provide the function on “Select Optimal Design” when you have a full factorial design and are trying to reduce the it to a partial design or “fractional design”.
It only provide the “D-optimal design” One needs to have a full factorial design first
and the choose the number of data points to be allowed to use.
69
These results give us some assurance that these designs are “good” designs in some general ways
Factorial designs typically share some (most) of these properties
There are excellent computer routines for finding optimal designs
The 2k design and design optimality
70
Addition of Center Points to a 2k Designs
Based on the idea of replicating some of the runs in a factorial design
Runs at the center provide an estimate of error and allow the experimenter to distinguish between two possible models:
01 1
20
1 1 1
First-order model (interaction)
Second-order model
k k k
i i ij i ji i j i
k k k k
i i ij i j ii ii i j i i
y x x x
y x x x x
Quadratic effects
71
Addition of Center Points to a 2k Designs
When adding center points, we assume that the k factors are quantitative.
Example on 22 design
72
Addition of Center Points to a 2k Designs
Five point:(-,-),(-,+),(+,-),(+,+), and (0,0).
nF=4 and nC=4 Let be the average of the
four runs at the four factorial points and let be the average of nC run at the center point.
Fy
Cy
73
Addition of Center Points to a 2k Designs
If the difference of is small, the center points lie on or near the plane passing through factorial points and there is no quadratic effects.
The hypotheses are:
CF yy
01
11
: 0
: 0
k
iii
k
iii
H
H
74
Addition of Center Points to a 2k Designs
2
Pure Quad( )F C F C
F C
n n y ySSn n
Test statistics:
with one degree of freedom
75
Addition of Center Points to a 2k Designs -- example
In example 6.2, it is a 24 factorial. By adding center points x1=x2=x3=x4=0, four
additional responses (filtration rates) are : 73, 75, 66,69.
So =70.75 and =70.06.Cy Fy
76
Addition of Center Points to a 2k Designs -- example
Term Effect Coef SE Coef T PConstant 70.063 1.008 69.52 0.000Temperature 21.625 10.812 1.008 10.73 0.002Pressure 3.125 1.562 1.008 1.55 0.219Conc. 9.875 4.937 1.008 4.90 0.016Stir Rate 14.625 7.312 1.008 7.26 0.005Temperature*Pressure 0.125 0.063 1.008 0.06 0.954Temperature*Conc. -18.125 -9.063 1.008 -8.99 0.003Temperature*Stir Rate 16.625 8.313 1.008 8.25 0.004Pressure*Conc. 2.375 1.188 1.008 1.18 0.324Pressure*Stir Rate -0.375 -0.187 1.008 -0.19 0.864Conc.*Stir Rate -1.125 -0.563 1.008 -0.56 0.616Temperature*Pressure*Conc. 1.875 0.937 1.008 0.93 0.421Temperature*Pressure*Stir Rate 4.125 2.063 1.008 2.05 0.133Temperature*Conc.*Stir Rate -1.625 -0.813 1.008 -0.81 0.479Pressure*Conc.*Stir Rate -2.625 -1.312 1.008 -1.30 0.284Temperature*Pressure*Conc.*Stir Rate 1.375 0.687 1.008 0.68 0.544Ct Pt 0.687 2.253 0.31 0.780
77
Addition of Center Points to a 2k Designs -- example
Analysis of Variance for Filtration (coded units)
Source DF Seq SS Adj SS Adj MS F PMain Effects 4 3155.25 3155.25 788.813 48.54 0.0052-Way Interactions 6 2447.88 2447.88 407.979 25.11 0.0123-Way Interactions 4 120.25 120.25 30.062 1.85 0.3204-Way Interactions 1 7.56 7.56 7.562 0.47 0.544 Curvature 1 1.51 1.51 1.512 0.09 0.780Residual Error 3 48.75 48.75 16.250 Pure Error 3 48.75 48.75 16.250Total 19 5781.20
78
Addition of Center Points to a 2k Designs
If curvature is significant, augment the design with axial runs to create a central composite design. The CCD is a very effective design for fitting a second-order response surface model
79
Addition of Center Points to a 2k Designs
80
Addition of Center Points to a 2k Designs
Use current operating conditions as the center point
Check for “abnormal” conditions during the time the experiment was conducted
Check for time trends Use center points as the first few runs when
there is little or no information available about the magnitude of error
81
Center Points and Qualitative Factors