Date post: | 01-Nov-2014 |
Category: |
Business |
Upload: | siddharth-nath |
View: | 14 times |
Download: | 5 times |
Design Of Experimentsa 3-day reminder
(1.1)
Frans van den BergThe Royal Veterinary and Agricultural University (KVL), DenmarkDept. of Dairy and Food Science, Food TechnologyChemometrics group [email protected]
www.models.kvl.dk
DOE - KVL(031025)
Per M. Bruun BrockhoffThe Royal Veterinary and Agricultural University (KVL), DenmarkDept. of Mathematics and Physics,Statistics group [email protected]
www.matfys.kvl.dk
Design Of Experimentsa 3-day reminder
(1.2)DOE - KVL
Day 1 1. 13:00-13:45h - Introduction; what was experimental design again?2. 14:00-14:45h - Design in one formula: y = X.b3. 15:00-15:45h - Statistical inference and ANOVA
Day 2 4. 09:00-12:00h - Hands-on DOE: computer exercise I *)
5. 13:00-13:45h - Data inspection and plotting (+ some PCA) 6. 14:00-14:45h - Miscellaneous subjects 7. 15:00-15:45h - Example: case study I
Day 3 8. 09:00-12:00h - Hands-on DOE: computer exercise II *)
9. 13:00-13:45h - Blocking and split-plot10. 14:00-14:45h - Example: case study I11. 15:00-15:45h - Introduction to mixed linear models
*) Participants can choose to take part in the theory (3 afternoons) sessions or theory + computer exercise (2 mornings) sessions.
Design Of ExperimentsAccompanying literature
(1.3)
• D.E. Coleman and D.C. Montgomery "A Systematic Approach to Planning for a DesignedIndustrial Experiment" Technometrics 35/1(1993)1-12
Some ideas on the setup and execution of experimental (industrial) designs
• J. Guervenou el.al. “Experimental design methodology and data analysis techniques appliedto optimise an organic synthesis” Chemometrics and Intelligent Lab. Sys. 63(2002)81-89
A typical experimental design and optimization example
• P. Bruun Brockhoff “Sensory profile average data: combining mixed model ANOVA with measurement error methodology” Food Quality and Preference 12(2001)413-426
An advanced study in experimental design and statistical inference in food research
DOE - KVL
Design Of ExperimentsDefinitions
(1.4)
Experiment:“A test or series of tests in which purposeful changes are made to the input variables of a process or system so that we may observe and identify the reasons for changes that may be observed in the output response.” a)
Design:“The art or process of deciding how something will look, work, etc.” b)
Motivation:“The statistician’s aim in designing surveys and experiments is to meet a desired degree of reliability at the lowest possible cost under the existing budgetary, administrative, and physical limitations within which the work must be conducted. In other words, the aim is efficiency - the most information (smallest error) for the money.” c)
a) Definition, like many of the ideas in course, taken from Douglas C. Montgomery “Design andAnalysis of Experiments” Wiley (2001, 5th)
b) Definition taken from Oxford Advanced Learner’s Dictionary (2000, 6th)c) William E. Deming “Some theory of sampling” Dover (1950 reprint); quoting R. A. Fisher
DOE - KVL
Leading exampleMaking apple juice
(1.5)
Controllable factors (experimental domain)
pH Sugar
Uncontrollable factors(‘nuisance factors’)
Material PeripheryProduction …?Technician
Output (sensor score/response)
Sensors
DOE - KVL
VariablesRational, ordinal or nominal
(1.6)
• Interval or ration scales; e.g. pH or sugar concentration• measurement or quantitative variables• continuous or discrete (e.g. counting)• most often encountered and easiest case
• Ordinal scale; e.g. “very poor”, “poor”, “average”, “good”, “excellent”• called ranked variables• distinct graduation, but scale-distance defined
• Nominal scale; e.g. “green”, “red”, “yellow” or “accept”, “reject”• qualitative of categorical variables or attributes• require some special “tricks” in statistical inference
DOE - KVL
Design Of ExperimentsObjective
(1.7)
a) Does x influence y, and if so, how?
b) Which inputs x are the most influential on the output y
c) How to set x’s so that y is (almost) always near target or optimal
d) How to set x’s to minimize variability in y
e) How to set x’s so that influence of z’s on y is minimized
Strategy of experimentation is the most important job of the experimenter
ProcessOutput y(Input)
Uncontrollable factors z … z
Controllable factors x … x
DOE - KVL
Design Of ExperimentsDifferent approaches
(1.8)
• Best guess (often works well due to good insight on the problem!)
• One factor at a time(“pseudo scientific”)
• Factorial design(22 to reveal interactions)
low highpH
scor
e
less moreSugar
scor
e
pHlo
w
hi
gh
less moreSugar
less moreSugar
scor
e
low pH
high pH
DOE - KVL
Example2 factor factorial design
(1.9)
• Five sensors score product (apple juice) for each design point average is product score
• Design is replicated twice: 22 x 2 = 4 x 2 = 8 experiments
• Result
pHlo
w
hi
gh
less moreSugar
(don’t like) 1 10 (like)√
(6,5) (7,7)
(6,6) (9,8)
DOE - KVL
Example2 factor factorial design
(1.10)
• Main effectsSugar and pH
• InteractioneffectSugar.pH
pHlo
w
hi
gh
less moreSugar
7+7+9+8 6+5+6+64 4 = 2.0-
pHlo
w
hi
gh
less moreSugar
7+7+6+6 6+5+9+84 4 = -0.5-
pHlo
w
hi
gh
less moreSugar
6+5+7+7 6+6+9+84 4 = -1.0-
DOE - KVL
Example2 factor factorial design
(1.11)
Important in interpretation are magnitude and direction of the effects: • sweet juice has a clear preference• a low pH leads to a higher score• the interaction Sugar-pH is weak
pHlo
w
hi
ghless more
Sugar
2.0
-1.0
-0.5
DOE - KVL
Design Of ExperimentsWhy DOE; different approaches revisited
(1.12)
• Best guess: good starting values, but “areas unvisited” remain unknown!
• One factor at a time:inefficient use of the data(“2 small factor designs”)
• Factorial design: maximum use of the data,since all observations are usedfor all the main and interactioneffects! And, the trend in thesurface gives an indication for“areas unvisited”.
low highpH
scor
e
less moreSugar
scor
e
pHlo
w
hi
gh
less moreSugar
DOE - KVL
Example3 factor factorial design
(1.13)
• Main effect Apple/Material 23 = 8 experiments (still!)
low
high
pH
less moreSugar
(7)
(7)
(8)
(9)(6)
(6)
(6)
(5)
6+6+9+7 6+5+8+74 4 = 0.5-
DOE - KVL
Design Of ExperimentsWhy DOE; relative efficiency
(1.14)
• Factorial design4 observations all effects estimatedas average over two
• One factor at a timeNeeds 6 observationsto get the same information
pHlo
w
hi
gh
less moreSugar
pHlo
w
hi
gh
less moreSugar
2x
Relative efficiency (6/4) = 1.5
2 3 4 5 61.5
2
2.5
3
3.5
factors
4 8 16 32 64observations
DOE - KVL
Example4 factor fractional factorial design
(1.15)
• Main effect production ½ x 24 = 8 experiments (still!)Full information on the main effects, partial information on the interactions
less moreSugar
(7)
(8)
(6)
(5)
less moreSugar
(7)
(9)
(6)
(6)
ProductionHand press Kitchen blender
DOE - KVL
low
high
pH
Mat
eria
lre
d
gre
en
Confounding4 factor fractional factorial design
(1.16)DOE - KVL
Of course you loose something by reducing the number of experiments…
• Main effects and interaction effects will be confounded
• Confounding means: we can not separate some effects/interactions
For the example:• We have 4 factors (Sugar, pH, Material and Production)• There are 4 blocks (2x Material plus 2x Production)• In this case: block effects and threefold interactions are confounded• E.g. Material (apple) effect and Sugar-pH-Material effect are confounded
• When reducing a full design, usually the assumption is made that high-orderinteractions are unimportant
• When reducing the design you have to carefully select the ‘things’ confounding
ExampleQuality of the response
(1.17)
1
10
moreSugar
less
highpH
low
(7)
(8)
(9)
(6)
(6)
(5)
DOE - KVL
ExampleQuality of the response
(1.18)
• Five sensors ‘score’ a product for each design pointaverage is product scorerepeated measurements
• Even if not explicitly used in the statistical analysis of a design, it is of utmostimportance to have an impression of uncertainty in the response!!!
• Laboratory info, analysis replicates, a priory knowledge, literature values…
(don’t like) 1 10 (like)average = (6)
average = (6)
DOE - KVL
ExampleQuality of the response
(1.19)
1
10
moreSugar
less
highpH
low
Signal-to-noise ratiofor repeated measurements
DOE - KVL
Subtle difference in definition of error in statistics: ‘error’ as in ‘wandering’ (e.g. knight errant) rather than ‘incorrect’. Observations come from a population, based on common part (e.g. average) and unique part (e.g. error).
ExampleQuality of the response
(1.20)
Center point replicates are a good indication of the reproducibility of design points,plus they can give a (cheap) indication of curvature in the response.
Total:8 + 3 = 118 + 5 = 13
(3x ; 5x)
DOE - KVL
ErrorsRandom versus systematic
(1.21)
Error: difference between true and observed valuee(i) = x(i) - µxe(i) = (x(i) - xbar) + (xbar - µx)
- random systematic- imprecision bias- precision accuracy
repeatability reproducibility
DOE - KVL
Day 1
Day 2
repeatability
reproducibility
↑xbar ↑µx
bias
True value µx (ISO): “The value which characterizes a quantity perfectly defined in the conditions which exist at the moment when that quantity is observed (or the subject of a determination). It is an ideal value which could be arrived at only if all causes of measurement error were eliminated and the population was infinite”.
variance is repeatability & reproducibility is a function of sample size nbias is not!
↓xbar
The three basicsReplication, randomization and blocking
(1.22)
1
10
moreSugar
less
highpH
low
Signal-to-noise ratiofor design replicates
(less; high)v.
(more; low)
DOE - KVL
The three basicsReplication, randomization and blocking
(1.23)
Signal-to-noise ratiofor design replicates
1
10
moreSugar
less
highpH
low
!
?
Design point has experimental error = statistical error = a random variable
DOE - KVL
The three basicsReplication, randomization and blocking
(1.24)
Underlying statistical methods require that the observations (or errors) are independent distributed random variables. Randomization of e.g. starting material and run-order of the design points (usually) makes this assumption valid.
“Reduce experimental error by training”
time
experiment
DOE - KVL
LessLow
MoreLow
LessHigh
MoreHigh
Centerpoints
LessLow
MoreLow
LessHigh
MoreHigh
Centerpoints
The three basicsReplication, randomization and blocking
(1.25)
All sorts of effects can influence a series of observation:• learning by experience: reducing the uncertainty• wear-and-tear in equipment: increasing the uncertainty• a change in lab-assistants: a jump in uncertainty
Randomization “reshuffles” the observations, eliminating a potential confounding between design- and run-order. It “averages out” the effect of uncontrollable extraneous factors.*)
10
10
“Shake 10 minutes”
time
experiment
*) Randomization is also the justification/motivation behind the so-called F-test, used excessively later in this course.
DOE - KVL
The three basicsReplication, randomization and blocking
(1.26)
So-called Blocking is capable to eliminate undesired/nuisance factors,by asking a different question
0
1
2
moreSugar
less
highpH
low
(1)
(0)
(0)(1)
“Improve signal”, but at a price…
e.g.- =
DOE - KVL
The three basicsReplication, randomization and blocking
(1.27)
low
high
pH
less moreSugar
(7)
(7)
(8)
(9)(6)
(6)
(6)
(5)
Blocking can also be a ‘necessary evil’, destroying the desired randomization.E.g. assume we don’t have enough green apples to run the full experiment twice.
DOE - KVL
The three basicsReplication, randomization and blocking
(1.28)
There is a link between the three basics!E.g. we want to perform 30 experiments (replicates), but we can only do 10 runs from one batch of raw material (a typical nuisance factor).
DOE - KVL
Uncertainty
Block effect
block 1 block 2 block 3
randomized
Some basic notionsSample and population
(1.29)
4.7 4.8 4.9 5 5.1 5.2 5.3
pH
pH 1
pH1
n1 = 10x
Samples: 10 pH-measurements taken from flask 1
Population: all the possible pH-values to be found in flask 1
We assume continuous distribution in population (not always the case; e.g. pH in European rivers)
5.00
5.00
4.90
5.04
4.94
5.06
5.17
5.05
5.06
4.90
pH 1
DOE - KVL
Some basic notionsExpectation and population parameters mean & variance
(1.30)
( )( )( )
1
)(
)()(
:1
2
222
:1
−
−=→−=
=→=
∑
∑
=
=
n
xixsxE
n
ixxxE
nixxx
nix
µσ
µ
Expected value sample statistic for n observation
Mean
Variance
Locality
Spread
DOE - KVL
68%
σµ
95%100%
Eg: Normal distribution N(µx,σx)
ObservationsNotice: µ and σ arepopulation constants
Some basic notions (1.31)
( )( )
( )
( )( ) %95Pr
%95Pr
100%
1
)()(
11
11
2:1
2
2:1
=+<<−=+<<−
===
=−
−===
−−
−−
==∑∑
xnxn
xnxn
rrrx
x
xxni
xni
SEtxSEtxSEtxSEt
ssxss
nsSE
ssn
xixsxE
n
ixx
µµµ
µ
Mean Variance Standard deviation
DOE - KVL
Standard error Relative SD RSD in %of the mean (coefficient of variation)
95% confidence interval/level; n large or sx given tn-1 = z0 = 1.96
Some basic notionsCritical t-values
(1.32)DOE - KVL
(d)
(a) (b)
(c)
(a) α is users choice(b) Increasing for α(c) Decreasing for n(d) n large (or σ known)
Some basic notionsSample statistics (= descriptors in numbers)
(1.33)
5.01Mean
50.1Sum
0.0070Variance
0.084S.D.
5.00
5.00
4.90
5.04
4.94
5.06
5.17
5.05
5.06
4.90
pH 1
DOE - KVL
sxxbar
Assumption: Normal distribution N(xbar,sx)
4.7 4.8 4.9 5 5.1 5.2 5.3
pH
pH 1
SEpH1 = 0.084/√10 = 0.026 t9 = 2.26
5.01 – 2.26x0.026 < µpH1 < 5.01 + 2.26x0.0264.95 < µpH1 < 5.07
Some basic notionsPooled standard deviation
(1.34)
4.7 4.8 4.9 5 5.1 5.2 5.3
pH
pH 1
pH 2
pH2
n2 = 10x
pH1
n1 = 10x
Two ‘treatments’e.g. making pH-buffers with two different stock solutions
5.175.00
5.105.00
5.164.90
5.175.04
5.194.94
5.145.06
4.915.17
5.215.05
5.075.06
5.104.90
pH 2pH 1
DOE - KVL
Some basic notionsComparing two samples
(1.35)
5.125.01Mean
51.250.1Sum
0.00740.0070Variance
0.0860.084S.D.
5.175.00
5.105.00
5.164.90
5.175.04
5.194.94
5.145.06
4.915.17
5.215.05
5.075.06
5.104.90
pH 2pH 1
DOE - KVL
Assuming the variance in flask 1 and 2 is the same:
(n1-1).s21 + (n2-1).s2
2 0.0630 + 0.0666s2
pooled = = = 0.0072(n1-1) + (n2-1) 9 + 9
4.7 4.8 4.9 5 5.1 5.2 5.3
pH
pH 1
pH 2
9 degrees-of-freedom (df) in estimatingeach of the standard deviations
4.95 < µpH1 < 5.075.06 < µpH2 < 5.18
Design Of ExperimentsIn distinct steps
(1.36)
Designexperiment
Statisticalanalysis
1. Recognition and definition of the problem
2. Choice of factors, levels and ranges
3. Selection of the response variable
4. Choice of experimental design
5. Performing the experiment
6. Statistical analysis of the data
7. Conclusions and recommendations
DOE - KVL
pH Sugar
Sensors
Design Of ExperimentsAn iterative process; e.g. optimization
(1.37)
(9)(8)
(10) “(11)”
Sugar
pH
Factorial screening experimentfor initial optimization
Unknown response surfacewith score contour lines
Central compositedesign for detailoptimization
DOE - KVL
pH
Sugar
Sensors