VIII. Introduction to Response Surface Methodology A.Sequential Experimentation 1. Phases of...

VIII. Introduction to Response Surface Methodology

A. Sequential Experimentation

1. Phases of Experimentation

I. Screening

• very small design (Resolution III)

• little, if any, replication

• analyze by normal probability plots

• extremely cost conscious (save resources for later)

• little, if any, concern about lack-of-fit

II. Initial Steepest Ascent (Descent)

• replicate at least the center

• begin to be concerned over lack-of-fit

• serious consideration to Resolution IV or higher designs

• less cost conscious

III. Follow-Up Steepest Ascent

IV. Optimization

• replication extremely important

• often starts as a mid-course correction

• lack-of-fit may suggest design augmentation

• popular designs

(a) central composite design (CCD)

(b) augment to a CCD

(c) Box-Behnken

• extremely expensive

Important Considerations in Choosing Designs

• Purpose of Experiment

• Proposed Model

• Estimation versus Testing

• Concern Over Lack-of-Fit

• Ability to Augment, if Necessary

• Protection from Outliers

The Relationship Between Design and Model

The specific design used determines which models are estimable!

1. Screening Designs

2. “Interactive” Model

iij

k

jj

iikkiii

x

xxxy

1

0

22110

iijij

k

j

k

jjjij

k

jj

iikkikkiiii

ikkiii

xxx

xxxxxx

xxxy

''

'

1 110

1,,131132112

22110

3. Optimization

iijij

k

j

k

jjj

k

jijjjij

k

jj

iikkikkiiii

ikkkii

ikkiii

xxxx

xxxxxx

xxx

xxxy

''

'

1 11

2

10

1,,131132112

22

222

2

111

22110

B. Steepest Ascent

Steepest ascent is an example of an optimization.

In calculus, how do we optimize something?

Consider a situation where we may model the response by a strict first-order model

Taking the first derivative with respect to xj,

kkxbxbxbby

22110ˆ

0ˆ

j

j

bx

y

Technically, we find the path of steepest ascent by a constrained optimization technique based on Lagrangian multipliers.

If we have only two factors, the path of steepest ascent is the line from the origin to the maximum response over the circle defined by

where c is the radius of the circle.

For k ≥ 3 factors, the path of steepest ascent is the line from the origin to the maximum response over the sphere defined by

0any for 2

2

2

1 ccxx

0any for 1

2

ccxk

jj

Since the path of steepest ascent represents the optimum response over spheres, we need to construct this path in the metric where spheres make the most sense.

Our procedure is:

• construct the path in the design variables, and

• convert this line back to the natural units.

64

62

60

807876747270

132

131

130

129

128

Path of SteepestAscent

Let x1 be the “key” factor, and let x10 be a specific value for this factor along the desired path.

The settings for the other factors are

This path passes through the center of the region of interest.

To convert this line back to the natural units,

• let cj be the center value, in the natural units, for the jth factor, and

• let dj be the “scaling” factor.

Let be the specific setting for the jth factor along the path of steepest ascent; thus,

kjxb

bx j

j,,3,2

10

1

0

0

*

0 jjjjxdcx

*

0jx

We usually pick the factor with the largest in absolute value estimated coefficient as our key factor.

We construct the line by increasing this key factor by a convenient amount each time.

We then run a series of experiments along this path.

Example: Kilgo (1988) performed an experiment to determine the effect of CO2 pressure, CO2 temperature, peanut moisture, CO2 flow rate and peanut particle size on the total yield of oil per batch of peanuts. A 25-1design was carried out and only temperature, x2, and particle size, x5, were important.

Since x5 has the largest in absolute value coefficient, we use it as our key factor.

For a specific setting of particle size, x50, along the path, the appropriate setting for temperature, x20, is given by

We can convert each value of x20 back to the natural units by

x2 x5 bj 9.875 -22.25 cj 60 2.665 dj 35 1.385

5050

5

2

2044.0 xx

b

bx

202022

*

203560 xxdcx

We can convert each value of x50 back to the natural units by

505055

*

50385.1665.2 xxdcx

Design Natural Variables Units Run x20 x50 x20

* x50* y

1 0.444 -1.0 75.5 1.28 81 2 0.488 -1.1 77.1 1.14 84 3 0.533 -1.2 78.7 1.00 90 4 0.577 -1.3 80.2 0.86 97 5 0.622 -1.4 81.8 0.73 95 6 0.666 -1.5 83.3 0.59 92

C. Second-Order Experiments

1. Overview

For first order designs, we must have:

• at least two levels for each variables.

• at least as many points as parameters to estimate, ie $k+1$; and

• the main effects can not be completely aliased with each other

For a second order model, we now must have:

• at least three levels for each variable in order to estimate both the first order and pure quadratic effects;

• at least as many points as parameters to estimate, ie

• the main effects and two factor interactions cannot be completely aliased with each other.

2

)2)(1(

21

kkkkk

The first design which meets these criteria is the 3k factorial design.

Note:

• this design uses three levels for each variable.

• 3k ≥ (k+1)(k+2)/2 [equality if and only if k=1]

• the 3k allows us to estimate all first order, pure quadratic, two-factor and higher interactions.

A major disadvantage of the 3k factorial:

Often the 3k design points are more than are required for the second order model.

Thus, the 3k factorial is really too expensive to be practical.

2. The Central Composite Design

The single most popular second order response surface design is the central composite design (CCD) developed by Box and Wilson (JRSS, B 1951).

The CCD was intended to be a more economical alternative to the 3k.

The design consists of three parts:

• a Res. V fraction of a 2k;

• a series of “axial” runs; and

• a series of center runs.

Sometimes, we convey this information by:

Note: The CCD is rather flexible in that is not fixed.

Thus, we may choose in order to meet some particular needs.

00

00

00

0

0

00

00

1111

D

It is instructive to see the two variable (k=2) CCD with

Note: With the center run, the k=2 CCD with is the 32 factorial design.

1

00

10

10

01

01

11

11

11

11

D

1

For k = 3, the CCD is

Note: With a single center run, the CCD requires 15 design runs [23 + 2•3 + 1] as opposed to the 27 required by a 33 factorial.

1

000

100

100

010

010

001

001

111

111

111

111

111

111

111

111

D

There are three common choices for $\alpha$:

• 1 (cuboidal)

• (spherical ccd)

• (rotatable) where nf is the number of factorial points.

A rotatable design is one where the prediction variance for any two points the same distance from the design center is the same.

As a result, if a design is rotatable, the prediction variance at some specific location only depends on that location's distance from the design center.

k

25.0

fn

Finally, an important question concerns how many center runs should we use.

From a variance-based optimality perspective: 1-3 are usually enough.

For detecting Lack of Fit, probably 6-8.

D. Optimization

Primary goal: of the second order experiment: optimization.

Consider:

From calculus, the point of optimal response is

• either the stationary point

• or some point on the boundary of the region.

Let x0 denote the factor settings at the stationary point.

Let y0 be the response at this point.

To find x0, we need to solve the system obtained by

2

222

2

111211222110ˆ xbxbxxbxbxbby

00 x

y

It is important to note that the stationary point may be:

• a point of maximum response;

• a point of minimum response, or

• a saddle point.

Even if the stationary point is an optimum, it may lie outside the region of experimentation.

Hence, we have little faith in it.

Bottom Line: Often, the stationary point is not a reliable point of optimal response.

Thus, the point of optimal response often lies on the boundary of the region of interest.

How should we find this point?

Consider Lagrangian multipliers.

We thus optimize

Subject to the constraint that

where R is the radius of the region of interest.

Let

where μ is the Lagrangian multiplier.

2

222

2

111211222110ˆ xbxbxxbxbxbby

2

1

2 Rxk

jj

k

jji

Rxy1

22ˆ

D. Multiple Responses

In many engineering experiments, we have more than one response of interest.

The key: to find appropriate compromise operating conditions.

Two basic approaches for jointly optimizing two or more responses:

• the desirability function, and

• nonlinear programming approaches.

Several statistical software packages include some form of the desirability function.

Some spreadsheets, including EXCEL, use good reduced gradient algorithms to perform appropriate constrained optimization.

The Desirability Function

The desirability function provides an overall measure for the “goodness” of a specific setting:

• A large value indicates a desirable set of values for the various responses.

• A low value indicates an undesirable set of values.

Derringer and Suich (Journal of Quality Technology 1980) proposed an approach which:

1. determines the individual desirabilities for each response and

2. then combines these individual desirabilities into an overall desirability.

The analyst then seeks to find the settings in the factors which maximize the overall desirability.

The individual desirabilities depend upon whether we wish

• to maximize the response of interest,

• to minimize the response of interest, or

• to achieve a specific target value for the response of interest.

Derringer and Suich use a scale from 0, which represents completely undesirable, to 1, which represents fully desirable, for their individual desirability functions.

Consider the target value case first.

• is the predicted value for the response.

• yT is the specific target value for the response of interest.

• yL is the smallest possible value which has any desirability.

• yU is the largest possible value which has any desirability.

y

One approach defines the desirability for this response by

With this definition,

• we give any predicted value for the response less than yL or greater than yU a desirability of 0.

• if the predicted value is exactly at the target value, we give it a desirability of 1.

• the further the predicted value is from the target, the lower desirability we give it.

U

UT

TU

U

TL

LT

L

L

yy

yyyyy

yy

yyyyy

yyyy

d

ˆfor 0

ˆfor ˆ

ˆfor ˆ

ˆfor 0

Derringer and Suich actually proposed the following slight modification

U

UT

t

TU

U

TL

s

LT

L

L

yy

yyyyy

yy

yyyyy

yy

yy

d

ˆfor 0

ˆfor ˆ

ˆfor ˆ

ˆfor 0

The exponents s and t provide greater flexibility in assigning the desirability within the range of interest.

Suppose we wish to maximize the response.

• yL is the smallest desirable value for this response.

• yU is a fully desirable value.

Basically, yU represents the point of diminishing returns.

In some cases, yU represents a true bound for the response.

In other cases, yU is some arbitrary value larger than the largestobserved response.

For this situation, Derringer and Suich proposed

U

UL

s

LU

L

L

yy

yyyyy

yy

yy

d

ˆfor 1

ˆfor ˆ

ˆfor 0

Suppose we wish to minimize the response.

• yU is the largest desirable value for this response.

• yL is a fully desirable value.

Basically, yL represents the point of diminishing returns.

U

UL

s

LU

U

L

yy

yyyyy

yy

yy

d

ˆfor 0

ˆfor ˆ

ˆfor 1

Once we have the individual desirabilities, we need to combine them in a meaningful way.

How should we do this?

Note:

• If any of the individual responses is completely undesirable, then the overall desirability also should be completely undesirable.

• Similarly, the overall desirability should be 1 if and only if all of the individual responses are completely desirable.

Suppose we have m responses of interest.

Let d1, d2, … , dm be the individual desirabilities.

Derringer and Suich defined the overall desirability, D, by

which is the geometric mean of the desirabilities.

mm

jj

dD/1

1

Myers and Montgomery (1995) outline an experiment, originally presented in Box, Hunter, and Hunter (1978).

Purpose: to find the settings for

• reaction time (x1),

• reaction temperature (x2), and

• the amount of catalyst (x3)

which maximize the conversion (y1) of a polymer and achieves a target value of 57.5 for the thermal activity (y2).

The lower bound for the conversion is 80.

The maximum possible value is 100.

Thermal activity must be between 55 and 60.

The experimental results:x1 x2 x3 y1 y2

-1 -1 -1 74 53.2 1 -1 -1 51 62.9 -1 1 -1 88 53.4 1 1 -1 70 62.6 -1 -1 1 71 57.3 1 -1 1 90 67.9 -1 1 1 66 59.8 1 1 1 97 67.8

-1.682 0 0 76 59.1 1.682 0 0 79 65.9

0 -1.682 0 85 60.0 0 1.682 0 97 60.7 0 0 -1.682 55 57.4 0 0 1.682 81 63.2 0 0 0 81 59.2 0 0 0 75 60.4 0 0 0 76 59.1 0 0 0 83 60.6 0 0 0 80 60.8 0 0 0 91 58.9

A reasonable model for conversion is

A reasonable model for thermal activity is

323121

2

3

2

2

2

13211

87.337.1113.219.5

94.283.120.664.403.109.81ˆ

xxxxxxx

xxxxxy

31223.226.423.60ˆ xxy

Let

• s=1 for conversion and

• s=t=1 for thermal activity.

The Derringer-Suich approach recommends a setting of

This setting gives a predicted conversion of 95.21 and a predicted thermal activity of 57.50.

The overall desirability for this setting is 0.8720, which is reasonably close to 1.

484.0 and 682.1 389.0321 xxx

Nonlinear Programming Approaches

Jointly optimizing two or more responses when the prediction equations contain second order or higher terms is a standard example of a nonlinear programming problem.

Many spreadsheets have built-in routines for solving these problems, for example the SOLVER routine in Microsoft EXCEL.

The major spreadsheets use good algorithms, usually based on reduced gradients.

We simply need

1. to input the appropriate prediction equations,

2. to input the constraints, and

3. to specify one response as the ``key.''

The spreadsheet routine finds the optimal setting.

These routines are not guaranteed to find a solution within the experimental region unless we specify some additional constraints.

For cuboidal experimental regions, i.e. when we use a face centered cube CCD, then each xj must fall within the interval -1 to 1.

In which case, we need the following additional constraints:

For spherical experimental regions, we need the additional constraint

With these additional constraints, the spreadsheet routine may not find a feasible solution.

When this occurs, we must relax one or more of our constraints in order to find a solution.

11 11 1121

k

xxx

kxk

jj

1

2

We can use the SOLVER routine in Microsoft EXCEL to find optimal conditions.

We use the same second order prediction equations as before.

Recall, we seek to maximize the conversion.

We thus specify conversion, , as our key response and tell the routine that we want to maximize it.

Since we have a target value of 57.5 for the thermal activity, we specify the following constraint:

Since this experiment uses a spherical CCD, we need to impose the additional constraint

1y

33

1

2 j

jx

5.57ˆ2y

The spreadsheet recommends the setting

This setting gives a conversion of 94.37% and a thermal activity of 57.5.

404.0 and 682.1 429.0321 xxx

E. Robust Parameter Design

1. Overall Taguchi Philosophy

Consider the manufacture of a ball point pen.

• important characteristic is the fit between the barrel and the cap.

• barrel and the cap are produced by separate injection molding processes.

• How can we produce these barrels and caps such that the fit is “optimal”?

What are the real issues in this problem?

The Japanese would view any part which does not achieve the target value as having some tangible loss of value.

Often, they use a squared error loss function:

Thus, a part may be within specifications and still considered “poor”, just not quite poor enough to be rejected.

Impacts of such a philosophy

1. should seek conditions which minimize the expected “loss”

2. must consider both the mean and the variance

2)(Loss Tyk

2. Overview of Taguchi's Parameter Design

Fundamental to this approach are the concepts of

1. control factors --- factors which the experimenter can readily control.

2. noise factors • factors which the experimenter either cannot or will not directly control in the process

• factors “move” randomly in actual process although they can be fixed for the experiment.

Suppose we wish to develop a cake mix “robust” to customer use.

What are possible control factors?

What are possible noise factors?

Goal of parameter design: find the settings for the control factors which are most “robust” to the noise factors.

Taguchi proposes “crossing”:

1. a design for the control factors (inner or control array)

2. a design for the noise factors (outer or noise array)

Each point of the inner array is replicated according to a design in the noise factors called the outer array.

Typically, these designs are “saturated” or “near-saturated”.

For example, suppose we have three control and three noise factors.

Let x1, x2, and x3 represent the control factors.

Let z1, z2, and z3 represent the noise factors.

An appropriate inner array is a 23-1 fraction or

x1 x2 x3

-1 -1 -1 -1 1 1

1 -1 1 1 1 -1

Each of these settings is replicated by the outer array.

z1 z2 z3

-1 -1 -1 -1 1 1

1 -1 1 1 1 -1

The resulting design consists of 4 x 4 or 16 runs and follows.

x1 x2 x3 z1 z2 z3

-1 -1 -1 -1 -1 -1 -1 1 1

1 -1 1 1 1 -1

-1 1 1 -1 -1 -1 -1 1 1

1 -1 1 1 1 -1

1 -1 1 -1 -1 -1 -1 1 1

1 -1 1 1 1 -1

1 1 -1 -1 -1 -1 -1 1 1

1 -1 1 1 1 -1

While the inner and outer arrays are completely saturated, all of the interactions between the control and noise factors are estimable!

An important question:

Why run the experiment in the noise factors?

• We seek to find the settings in the control factors which are most “robust” to the noise factors.

• Thus, the noise levels ±1 correspond to what?

What is the natural consequence?

How does this contrast with typical experimentation?

All the designs recommended by Taguchi (the so-called “Taguchi designs”) are orthogonal arrays of strength 2.

• allow the estimation of “main effects”

• do not allow the estimation of any interactions.

Examples of orthogonal arrays of strength 2 include:

1. Resolution III fractional factorial designs

2. Plackett-Burman designs.

Three level orthogonal arrays do exist.

• allow the estimation of the linear and pure quadratic terms

• do not allow estimation of the two-factor or higher interactions.

3. Contributions/Drawbacks

The greatest contributions of this total approach are:

1. it seriously considers the variance over a region of interest; and

2. it provides a rationale for modeling the behavior of the noise in terms of the control factors.

Note:

1. RSM “buries” the impact of the noise factors in εi.

2. Taguchi assumes that the variance is not constant over the region of interest!

• A nice insight: it may be possible to model the variance.

• For Taguchi, the variance changes as the result of noise x control interactions.

This is a rather limited approach for modeling the variance, but a start.

Naturally, there are several drawbacks to the Taguchi approach.

1. the sequential nature of investigation is not exploited; (not completely fair)

2. it uses an unnecessarily limited number of designs which do not adequately deal with interactions;

3. better, simpler, and more efficient analyses exist;

4. importance of data transformations seems not to be appreciated or exploited; (definitely unfair)

5. Taguchi uses baffling terminology.

6. the designs used by Taguchi are much larger than really required since they completely cross the noise and the control factors;

7. the Taguchi approach does not go far enough to model the variance.

4. Statistical Alternatives to Robust Parameter Design

I. The “Combined Array” Method

The basic ideas underlying the “combined array” are

• propose a single model in both the control and noise factors

• run a design specifically for the model proposed.

In the process,

• we can estimate some of the control by control interactions

• we can allocate our experimental resources more efficiently,

• in some cases, the resulting designs are significantly smaller.

(Usually, if we use a fractional factorial, the combined array is about the same size as the corresponding crossed array.)

5. Treating the Mean and the Variance as a Dual Response Problem

The basic goal of parameter design is to achieve a target condition for the mean while simultaneously minimizing the variance.

• Suppose that an n point design has been replicated such that each design point has been run a total of m ≥ 2 times

Any reasonable method may be applied to generate the replication including the use of an outer array.

• let be the estimated variance at the ith design point

• Consider modeling the mean by

• Consider modeling the variance by

where is a suitable transformation of the variance.

2

is

)(ˆii

xfy

)()( 2

iixgst

)( 2

ist

Much work has been and currently is being done on modeling variance.

• Most authors suggest using the natural logarithm of the variance , but the theory for this transformation requires moderate to large amounts of replication (m ≥ 10) to justify this approach

• A reasonable alternative uses the standard deviation, which is the square root transformation.

• Other approaches can be employed, including generalized linear models (GLIM).

Two legitimate questions surface:

1. What do we gain by explicitly modeling the variance?

2. What are the consequences of modeling the variance, particularly with regard to estimation?

)log( 2

is

Example: The Printing Study Experiment

The purpose of the experiment was to study the effect of:

x1: speed x2: pressure x3: distance

upon a printing machine's ability to apply coloring inks upon package labels.

The experiment used a 33 complete factorial with three runs at each design point (m=3).

Assume that the goal of the experiment is to find the conditions which minimize the variability while achieving a target value of 500.0 for the mean response.

The fitted response surface for the response itself was:

The fitted response surface for the standard deviation was:

321323121

321

8.826.435.7566

5.1314.1091777.314ˆ

xxxxxxxxx

xxx

32.290.48ˆ x

Consider using the Derringer-Suich desirability approach to minimize over the cube defined by the 33 design subject to the constraints:

1. (acceptable range: 490-510)

2. a maximal acceptable value for is 60.

The resulting best settings are

x1 = 1.00 x2 = 1.00 x3 = -0.5

which yield an estimated standard deviation of 33.4 and a desirability of 0.6662.

Consider using the SOLVER tool in EXCEL to minimize over the cube defined by the 33 design subject to the constraint:

The resulting best settings again are

x1 = 1.00 x2 = 1.00 x3 = -0.50

500ˆ

500ˆ

The advantages of the statistical procedures are:

1. They directly consider the question of interest rather than burying this question in a signal-to-noise ratio.

2. They are standard applications of a basic RSM procedure. As a result,

(a) they allow a sequential investigation;

(b) they allow the use of a broad array of experimental designs; and

(c) they use more rigorous methods of analysis.

IV. Concluding Remarks

This course has introduced the fundamentals of data analysis to engineers.

Students need to understand that this course only offers a beginning.

Other statistical topics of importance to engineers include:

• more model building and model diagnostics (more regression analysis),

• more process control,

• more experimental design,

• reliability, and

• time series.

Many departments offer full course on each of these topics.

This course gives the student a reasonable foundation for pursuing these more advanced areas in statistics.

Date post:	25-Dec-2015
Category:	Documents
Upload:	abraham-garrett
View:	221 times
Download:	0 times

VIII. Introduction to Response Surface Methodology A.Sequential Experimentation 1. Phases of...

Documents