DOE - Design Of Experiments A 3 Day Reminder

Design Of Experimentsa 3-day reminder

(1.1)

Frans van den BergThe Royal Veterinary and Agricultural University (KVL), DenmarkDept. of Dairy and Food Science, Food TechnologyChemometrics group [email protected]

www.models.kvl.dk

DOE - KVL(031025)

Per M. Bruun BrockhoffThe Royal Veterinary and Agricultural University (KVL), DenmarkDept. of Mathematics and Physics,Statistics group [email protected]

www.matfys.kvl.dk

Design Of Experimentsa 3-day reminder

(1.2)DOE - KVL

Day 1 1. 13:00-13:45h - Introduction; what was experimental design again?2. 14:00-14:45h - Design in one formula: y = X.b3. 15:00-15:45h - Statistical inference and ANOVA

Day 2 4. 09:00-12:00h - Hands-on DOE: computer exercise I *)

5. 13:00-13:45h - Data inspection and plotting (+ some PCA) 6. 14:00-14:45h - Miscellaneous subjects 7. 15:00-15:45h - Example: case study I

Day 3 8. 09:00-12:00h - Hands-on DOE: computer exercise II *)

9. 13:00-13:45h - Blocking and split-plot10. 14:00-14:45h - Example: case study I11. 15:00-15:45h - Introduction to mixed linear models

*) Participants can choose to take part in the theory (3 afternoons) sessions or theory + computer exercise (2 mornings) sessions.

Design Of ExperimentsAccompanying literature

(1.3)

• D.E. Coleman and D.C. Montgomery "A Systematic Approach to Planning for a DesignedIndustrial Experiment" Technometrics 35/1(1993)1-12

Some ideas on the setup and execution of experimental (industrial) designs

• J. Guervenou el.al. “Experimental design methodology and data analysis techniques appliedto optimise an organic synthesis” Chemometrics and Intelligent Lab. Sys. 63(2002)81-89

A typical experimental design and optimization example

• P. Bruun Brockhoff “Sensory profile average data: combining mixed model ANOVA with measurement error methodology” Food Quality and Preference 12(2001)413-426

An advanced study in experimental design and statistical inference in food research

DOE - KVL

Design Of ExperimentsDefinitions

(1.4)

Experiment:“A test or series of tests in which purposeful changes are made to the input variables of a process or system so that we may observe and identify the reasons for changes that may be observed in the output response.” a)

Design:“The art or process of deciding how something will look, work, etc.” b)

Motivation:“The statistician’s aim in designing surveys and experiments is to meet a desired degree of reliability at the lowest possible cost under the existing budgetary, administrative, and physical limitations within which the work must be conducted. In other words, the aim is efficiency - the most information (smallest error) for the money.” c)

a) Definition, like many of the ideas in course, taken from Douglas C. Montgomery “Design andAnalysis of Experiments” Wiley (2001, 5th)

b) Definition taken from Oxford Advanced Learner’s Dictionary (2000, 6th)c) William E. Deming “Some theory of sampling” Dover (1950 reprint); quoting R. A. Fisher

DOE - KVL

Leading exampleMaking apple juice

(1.5)

Controllable factors (experimental domain)

pH Sugar

Uncontrollable factors(‘nuisance factors’)

Material PeripheryProduction …?Technician

Output (sensor score/response)

Sensors

DOE - KVL

VariablesRational, ordinal or nominal

(1.6)

• Interval or ration scales; e.g. pH or sugar concentration• measurement or quantitative variables• continuous or discrete (e.g. counting)• most often encountered and easiest case

• Ordinal scale; e.g. “very poor”, “poor”, “average”, “good”, “excellent”• called ranked variables• distinct graduation, but scale-distance defined

• Nominal scale; e.g. “green”, “red”, “yellow” or “accept”, “reject”• qualitative of categorical variables or attributes• require some special “tricks” in statistical inference

DOE - KVL

Design Of ExperimentsObjective

(1.7)

a) Does x influence y, and if so, how?

b) Which inputs x are the most influential on the output y

c) How to set x’s so that y is (almost) always near target or optimal

d) How to set x’s to minimize variability in y

e) How to set x’s so that influence of z’s on y is minimized

Strategy of experimentation is the most important job of the experimenter

ProcessOutput y(Input)

Uncontrollable factors z … z

Controllable factors x … x

DOE - KVL

Design Of ExperimentsDifferent approaches

(1.8)

• Best guess (often works well due to good insight on the problem!)

• One factor at a time(“pseudo scientific”)

• Factorial design(22 to reveal interactions)

low highpH

scor

e

less moreSugar

scor

e

pHlo

w

hi

gh

less moreSugar

less moreSugar

scor

e

low pH

high pH

DOE - KVL

Example2 factor factorial design

(1.9)

• Five sensors score product (apple juice) for each design point average is product score

• Design is replicated twice: 22 x 2 = 4 x 2 = 8 experiments

• Result

pHlo

w

hi

gh

less moreSugar

(don’t like) 1 10 (like)√

(6,5) (7,7)

(6,6) (9,8)

DOE - KVL


(1.10)

• Main effectsSugar and pH

• InteractioneffectSugar.pH

pHlo

w

hi

gh

less moreSugar

7+7+9+8 6+5+6+64 4 = 2.0-

pHlo

w

hi

gh

less moreSugar

7+7+6+6 6+5+9+84 4 = -0.5-

pHlo

w

hi

gh

less moreSugar

6+5+7+7 6+6+9+84 4 = -1.0-

DOE - KVL


(1.11)

Important in interpretation are magnitude and direction of the effects: • sweet juice has a clear preference• a low pH leads to a higher score• the interaction Sugar-pH is weak

pHlo

w

hi

ghless more

Sugar

2.0

-1.0

-0.5

DOE - KVL

Design Of ExperimentsWhy DOE; different approaches revisited

(1.12)

• Best guess: good starting values, but “areas unvisited” remain unknown!

• One factor at a time:inefficient use of the data(“2 small factor designs”)

• Factorial design: maximum use of the data,since all observations are usedfor all the main and interactioneffects! And, the trend in thesurface gives an indication for“areas unvisited”.

low highpH

scor

e

less moreSugar

scor

e

pHlo

w

hi

gh

less moreSugar

DOE - KVL


(1.13)

• Main effect Apple/Material 23 = 8 experiments (still!)

low

high

pH

less moreSugar

(7)

(7)

(8)

(9)(6)

(6)

(6)

(5)

6+6+9+7 6+5+8+74 4 = 0.5-

DOE - KVL

Design Of ExperimentsWhy DOE; relative efficiency

(1.14)

• Factorial design4 observations all effects estimatedas average over two

• One factor at a timeNeeds 6 observationsto get the same information

pHlo

w

hi

gh

less moreSugar

pHlo

w

hi

gh

less moreSugar

2x

Relative efficiency (6/4) = 1.5

2 3 4 5 61.5

2

2.5

3

3.5

factors

4 8 16 32 64observations

DOE - KVL

Example4 factor fractional factorial design

(1.15)

• Main effect production ½ x 24 = 8 experiments (still!)Full information on the main effects, partial information on the interactions

less moreSugar

(7)

(8)

(6)

(5)

less moreSugar

(7)

(9)

(6)

(6)

ProductionHand press Kitchen blender

DOE - KVL

low

high

pH

Mat

eria

lre

d

gre

en

Confounding4 factor fractional factorial design

(1.16)DOE - KVL

Of course you loose something by reducing the number of experiments…

• Main effects and interaction effects will be confounded

• Confounding means: we can not separate some effects/interactions

For the example:• We have 4 factors (Sugar, pH, Material and Production)• There are 4 blocks (2x Material plus 2x Production)• In this case: block effects and threefold interactions are confounded• E.g. Material (apple) effect and Sugar-pH-Material effect are confounded

• When reducing a full design, usually the assumption is made that high-orderinteractions are unimportant

• When reducing the design you have to carefully select the ‘things’ confounding

ExampleQuality of the response

(1.17)

1

10

moreSugar

less

highpH

low

(7)

(8)

(9)

(6)

(6)

(5)

DOE - KVL


(1.18)

• Five sensors ‘score’ a product for each design pointaverage is product scorerepeated measurements

• Even if not explicitly used in the statistical analysis of a design, it is of utmostimportance to have an impression of uncertainty in the response!!!

• Laboratory info, analysis replicates, a priory knowledge, literature values…

(don’t like) 1 10 (like)average = (6)

average = (6)

DOE - KVL


(1.19)

1

10

moreSugar

less

highpH

low

Signal-to-noise ratiofor repeated measurements

DOE - KVL

Subtle difference in definition of error in statistics: ‘error’ as in ‘wandering’ (e.g. knight errant) rather than ‘incorrect’. Observations come from a population, based on common part (e.g. average) and unique part (e.g. error).


(1.20)

Center point replicates are a good indication of the reproducibility of design points,plus they can give a (cheap) indication of curvature in the response.

Total:8 + 3 = 118 + 5 = 13

(3x ; 5x)

DOE - KVL

ErrorsRandom versus systematic

(1.21)

Error: difference between true and observed valuee(i) = x(i) - µxe(i) = (x(i) - xbar) + (xbar - µx)

- random systematic- imprecision bias- precision accuracy

repeatability reproducibility

DOE - KVL

Day 1

Day 2

repeatability

reproducibility

↑xbar ↑µx

bias

True value µx (ISO): “The value which characterizes a quantity perfectly defined in the conditions which exist at the moment when that quantity is observed (or the subject of a determination). It is an ideal value which could be arrived at only if all causes of measurement error were eliminated and the population was infinite”.

variance is repeatability & reproducibility is a function of sample size nbias is not!

↓xbar

The three basicsReplication, randomization and blocking

(1.22)

1

10

moreSugar

less

highpH

low

Signal-to-noise ratiofor design replicates

(less; high)v.

(more; low)

DOE - KVL


(1.23)

Signal-to-noise ratiofor design replicates

1

10

moreSugar

less

highpH

low

!

?

Design point has experimental error = statistical error = a random variable

DOE - KVL


(1.24)

Underlying statistical methods require that the observations (or errors) are independent distributed random variables. Randomization of e.g. starting material and run-order of the design points (usually) makes this assumption valid.

“Reduce experimental error by training”

time

experiment

DOE - KVL

LessLow

MoreLow

LessHigh

MoreHigh

Centerpoints

LessLow

MoreLow

LessHigh

MoreHigh

Centerpoints


(1.25)

All sorts of effects can influence a series of observation:• learning by experience: reducing the uncertainty• wear-and-tear in equipment: increasing the uncertainty• a change in lab-assistants: a jump in uncertainty

Randomization “reshuffles” the observations, eliminating a potential confounding between design- and run-order. It “averages out” the effect of uncontrollable extraneous factors.*)

10

10

“Shake 10 minutes”

time

experiment

*) Randomization is also the justification/motivation behind the so-called F-test, used excessively later in this course.

DOE - KVL


(1.26)

So-called Blocking is capable to eliminate undesired/nuisance factors,by asking a different question

0

1

2

moreSugar

less

highpH

low

(1)

(0)

(0)(1)

“Improve signal”, but at a price…

e.g.- =

DOE - KVL


(1.27)

low

high

pH

less moreSugar

(7)

(7)

(8)

(9)(6)

(6)

(6)

(5)

Blocking can also be a ‘necessary evil’, destroying the desired randomization.E.g. assume we don’t have enough green apples to run the full experiment twice.

DOE - KVL


(1.28)

There is a link between the three basics!E.g. we want to perform 30 experiments (replicates), but we can only do 10 runs from one batch of raw material (a typical nuisance factor).

DOE - KVL

Uncertainty

Block effect

block 1 block 2 block 3

randomized

Some basic notionsSample and population

(1.29)

4.7 4.8 4.9 5 5.1 5.2 5.3

pH

pH 1

pH1

n1 = 10x

Samples: 10 pH-measurements taken from flask 1

Population: all the possible pH-values to be found in flask 1

We assume continuous distribution in population (not always the case; e.g. pH in European rivers)

5.00

5.00

4.90

5.04

4.94

5.06

5.17

5.05

5.06

4.90

pH 1

DOE - KVL

Some basic notionsExpectation and population parameters mean & variance

(1.30)

( )( )( )

1

)(

)()(

:1

2

222

:1

−

−=→−=

=→=

∑

∑

=

=

n

xixsxE

n

ixxxE

nixxx

nix

µσ

µ

Expected value sample statistic for n observation

Mean

Variance

Locality

Spread

DOE - KVL

68%

σµ

95%100%

Eg: Normal distribution N(µx,σx)

ObservationsNotice: µ and σ arepopulation constants

Some basic notions (1.31)

( )( )

( )

( )( ) %95Pr

%95Pr

100%

1

)()(

11

11

2:1

2

2:1

=+<<−=+<<−

===

=−

−===

−−

−−

==∑∑

xnxn

xnxn

rrrx

x

xxni

xni

SEtxSEtxSEtxSEt

ssxss

nsSE

ssn

xixsxE

n

ixx

µµµ

µ

Mean Variance Standard deviation

DOE - KVL

Standard error Relative SD RSD in %of the mean (coefficient of variation)

95% confidence interval/level; n large or sx given tn-1 = z0 = 1.96

Some basic notionsCritical t-values

(1.32)DOE - KVL

(d)

(a) (b)

(c)

(a) α is users choice(b) Increasing for α(c) Decreasing for n(d) n large (or σ known)

Some basic notionsSample statistics (= descriptors in numbers)

(1.33)

5.01Mean

50.1Sum

0.0070Variance

0.084S.D.

5.00

5.00

4.90

5.04

4.94

5.06

5.17

5.05

5.06

4.90

pH 1

DOE - KVL

sxxbar

Assumption: Normal distribution N(xbar,sx)

4.7 4.8 4.9 5 5.1 5.2 5.3

pH

pH 1

SEpH1 = 0.084/√10 = 0.026 t9 = 2.26

5.01 – 2.26x0.026 < µpH1 < 5.01 + 2.26x0.0264.95 < µpH1 < 5.07

Some basic notionsPooled standard deviation

(1.34)

4.7 4.8 4.9 5 5.1 5.2 5.3

pH

pH 1

pH 2

pH2

n2 = 10x

pH1

n1 = 10x

Two ‘treatments’e.g. making pH-buffers with two different stock solutions

5.175.00

5.105.00

5.164.90

5.175.04

5.194.94

5.145.06

4.915.17

5.215.05

5.075.06

5.104.90

pH 2pH 1

DOE - KVL

Some basic notionsComparing two samples

(1.35)

5.125.01Mean

51.250.1Sum

0.00740.0070Variance

0.0860.084S.D.

5.175.00

5.105.00

5.164.90

5.175.04

5.194.94

5.145.06

4.915.17

5.215.05

5.075.06

5.104.90

pH 2pH 1

DOE - KVL

Assuming the variance in flask 1 and 2 is the same:

(n1-1).s21 + (n2-1).s2

2 0.0630 + 0.0666s2

pooled = = = 0.0072(n1-1) + (n2-1) 9 + 9

4.7 4.8 4.9 5 5.1 5.2 5.3

pH

pH 1

pH 2

9 degrees-of-freedom (df) in estimatingeach of the standard deviations

4.95 < µpH1 < 5.075.06 < µpH2 < 5.18

Design Of ExperimentsIn distinct steps

(1.36)

Designexperiment

Statisticalanalysis

1. Recognition and definition of the problem

2. Choice of factors, levels and ranges

3. Selection of the response variable

4. Choice of experimental design

5. Performing the experiment

6. Statistical analysis of the data

7. Conclusions and recommendations

DOE - KVL

pH Sugar

Sensors

Design Of ExperimentsAn iterative process; e.g. optimization

(1.37)

(9)(8)

(10) “(11)”

Sugar

pH

Factorial screening experimentfor initial optimization

Unknown response surfacewith score contour lines

Central compositedesign for detailoptimization

DOE - KVL

pH

Sugar

Sensors

Date post:	01-Nov-2014
Category:	Business
Upload:	siddharth-nath
View:	14 times
Download:	5 times

DOE - Design Of Experiments A 3 Day Reminder

Business