Analysis of variance (ANOVA) forthe uncertainty evaluationNational institute of advanced industrial
science and technology (AIST)National metrology institute of Japan (NMIJ)
Katsuhiro SHIRONO
1
2
Please find more information on uncertainty on
http://staff.aist.go.jp/k.shirono/download_e.html
On no event we will be liable to you for any damages arising out of the uses of this document.
Katsuhiro Shirono @ AIST, NMIJ, JAPAN
LET’S TRY.
3
Distribute ribbon to 3 people.Cut the ribbon in 10 cm without a measure.
Cut the ribbon in the same length as the first ribbonDo it again.(Totally, 3 ribbons for 1 person.)
4
Results
Alice Bob Charlie
9.6 cm 10.4 cm 8.2 cm
9.2 cm 10.7 cm 8.3 cm
9.4 cm 10.5 cm 8.7 cm
5
Regarding the results as a 9 times repetition,
the standard deviation is 0.96 cm.
Is this the standard deviation of the length which people considered to be 10 cm?
6
the standard deviation is 1.07 cm.
Let’s look at the average of 3 repetitions.
Regarding the above as a 3 times repetition,
Alice Bob Charlie
9.400 cm 10.533 cm 8.400 cm
7
The analysis does not seem so simple.
A B CDifference in operators
Difference in repetitions
8
Analysis of variance (ANOVA) is…
the analysis method to separate some factors affecting experimental results.
Usually, the ANOVA is employed to know the significance of the factors in a qualitative sense.But, in the uncertainty evaluation, it is employed to evaluate the uncertainties quantitatively.
9
The results of the ANOVA:
Standard deviation of the difference in operators is 1.06 cm.
Standard deviation of the repetition error is 0.21 cm.
10
How these results relate to the uncertainty evaluation will be touched on later.
DESIGN OF EXPERIMENT
11
Factor/Level
Factor:Source of variation. When we evaluate the effects of institutes and operators. The factors are “institute” and “operators”.
Level:Value or label of a factor to stratify the data. When the operators are Alice and Bob, the levels are Alice and Bob.
12
Fixed/ Random (effect) factor
A fixed factor is the factor whose all possible levels are included into the experiment.
A random factor is the factor whose all possible levels cannot be included into the experiment.
Usually, only random factors are investigated in the uncertainty evaluation.
13
Crossed/Nested
To investigate the difference due to institute and operator in a measurement, two operators (A and B) will conduct the measurement in two institutes (Institutes X and Y).
InstituteX
InstituteY
A AB B
14
How many operators exist in this design of experiment?
A B
X
A B
Y
15
Four? or …
A B
X Y
A B
16
A B
X
A B
Y
A B
X Y
A B
Operators are crossed.
Operators are nested.
17
Is the design of experiment in which operators are crossed but institutes are nested possible?
Quiz ①
A
X
A
Y
B B
X Y
18
19
n‐way layout (one‐way layout, two‐way layout …)
All factors are crossed.
A B
X Y
A B
20
Which is the advantage of the n‐way layout?
□ Less institutes or operators are necessary.
□ The interaction between factors can be investigated.
21
Interaction
Alice reports larger values in Institute X, and less values in Institute Y.
Bob reports larger values in Institute Y, and less values in Institute X.
The interaction means this type of compatibility.
X Y
A B
22
n‐stage nested design
Except only one factor, all the other factors are nested.
A B
X
A B
Y
23
Which is the advantage of the n‐stage nested design.
□ When operators are nested to institutes, larger number of the operators can be investigated with the same number of the experiments.
□ We don’t have to think of the interaction.
24
Four operators can be investigated in a two‐stage nested design,while only two operators is
investigated in a two‐way layout.
A B
X
A B
Y
25
■In n‐way layout, the number of levels is tends to be small. Or, the greater number of experiments are necessary for the same number of levels.
■In n‐stage nested design, the interactions cannot be investigated. If we would like to know the effect of the interaction, it cannot be applied.
Advantage Disadvantagen‐way layout Investigation on
InteractionGreater experimental scale
n‐stage nested design
Less experimental scale
Confounding of Interaction
26
Quiz ②Is the following design of experiment a three‐way layout, a three‐stage nested design, or something else?
A B
X
A B
YDay 1
A B
X
A B
YDay 2
27
28
Design of experiment is…
to choose the appropriate scenario to quantify what we want to know.
When we design an experiment, a randomization is important.
29
A B
X Y
A B
In a two‐way layout, each operator measures two times in a institute.How can we randomize this experiment?
Quiz ③
30
31
The randomization may be unrealistic sometimes.
The one‐way layout and the n‐stage nested design (n ≥ 2) are often employed instead of that.
The n‐way layout (n ≥ 2) is not popular, in the uncertainty evaluation.
32
Other specific terms
We are interested in interactions. The perfect randomization is impossible.
In this case, we can use a …
Split plot design.A redundancy is given in the design.
It is not popular in the uncertainty evaluation.33
We are interested in specific interactions. Cost is limited.
In this case, we can use an …
Orthogonal designThe other interactions are neglected.
It is not popular in the uncertainty evaluation.34
Other specific terms
35
CALCULATION OF ONE‐WAY LAYOUT
36
Alice Bob Charlie
9.6 cm 10.4 cm 8.2 cm
9.2 cm 10.7 cm 8.3 cm
9.4 cm 10.5 cm 8.7 cm
Results
37
Looking at only the results of ,
3−1(9.6−9.4)2+(9.2 −9.4)2+(9.4−9.4)2
0.040 cm2
38
A
The average is 0.044 cm2.39
For , 0.023 cm2.
For , 0.070 cm2.
B
C
The variance of the average is 3 times smaller than the variance of each value:
30.044
0.015 cm2
40
Alice Bob Charlie
9.400 cm 10.533 cm 8.400 cm
the variance is 1.139 cm2.41
Let’s look at the average of 3 repetitions.
Regarding the above as a 3 times repetition,
In this variance, the component other than the repletion variance is the variance due to the operator.
1.139 − 0.015 = 1.124 cm2
甲 乙 C
42
What a bother! So, use …
ANOVA table
that is the table for some frequently employed design of experiment.
43
factor S(Square sum)
f(degrees of freedom)
V(Mean square)
Expectation of the mean
square
A fA = a‐1 VA = SA/fA e2 + n∙a
2
Repetition fe = a(n‐1) Ve = Se/fe e2
Sum f = an‐1
a
i
n
ji xxS
1 1
2
A
a
i
n
jiij xxS
1 1
2e
a
i
n
jij xxS
1 1
2
Factors are Factor A and repetition whose numbers of level is a and n, respectively. A
2 and e2 are the variances for
Factor A and repetition, respectively.
ANOVA table for one‐way layout
44
From this table, the relationship between the variances and the mean squares are given.
VA ≈ e2 + n∙a
2
Ve ≈ e2
a2 ≈ (VA−Ve)/n
45
Notation
The equations are just approximations.Hence, sometimes VA < Ve. This cangive the negative value in theestimation of a
2.
Usually, setting the variance as 0, andreanalyze the data as only repetitivemeasurement data.
46
EXAMPLES TO SHOW IMPORTANT POINTS
47
Calibration of micropipettes
Difference due to operator
Difference due to measurement day
■
■
48
B C
Day 1
This seems a nice design.
A B C
Day 2
A
49
The actual design was given as this.
Day 1
A
Day 2
B
Day 3
C
50
Which is correct?
□ Since the effects of the operators and the days cannot be separated, this design was wrong.
□ Although the effects of the operators and the days cannot be separated, this design was not so bad.
51
Since a single operator implements the calibration in asingle day in the actual procedure, the design wasadequate to know the combined uncertainty.
Not so bad, because we want …
The std. dev. of the
operator
The std. dev. of theDay
2 2
52
Quiz ④When a single operator implements a calibration in asingle institute in the actual procedure, can we obtainadequate information with a one‐way layoutexperiments instead of the two‐way layout below?
A B
X Y
A B
53
54
Distribution of temperature in a thermostat
Difference due to point■
1
2 3
55
One‐way layout2 repetitions for each point
Point 1 Point 3
Thermo‐meter
2
Point 2
Thermo‐meter
2
Thermo‐meter
2
56
Suppose the results were yielded as:
The std. dev. of thepoint
= 1.0 ⁰C
The std. dev. of the
repetition= 0.5 ⁰C
No gap was found in the setting temperatureand the average of the measured temperature. 57
When this thermostat is employed with setting40 ºC in the next time, is the following can bethe uncertainty of the temperature of 40 ºC?
2 2
3 3×258
The std. dev. of the
repetition
The std. dev. of thepoint
The answer is …
The uncertainty of the average value
☜ The uncertainty due to the lack of the knowledge on the point in the next time.
59
2 2
3 3×2
The std. dev. of the
repetition
The std. dev. of thepoint
The std. dev. of the
location
2
60
Suppose that there are 100 standard resistances.
10 samples were selected to determine the value. Theaverage was 10.0 . Based on the ANOVA, the standarddeviations due to sample is given as 0.1 . Thestandard deviations due to repetition is negligibly small.
When selling the residual 90 resistances with the valueof 10.0 , how large is the appropriate uncertainty?Please neglect the other uncertainties than thedifference among the samples.
61
Quiz ⑤
メモ
62
ADDITIONAL COMMENT
63
Basically, the ANOVA is useful to evaluate the uncertainties due to operator, institute, location, day, and so on. These are the factors whose levels have no physically meaningful values.
A B
X
A B
Y
□
64
APPENDIX: ANOVA TABLE FOR TWO‐WAY LAYOUT AND TWO‐STAGE NESTED DESIGN
65
Factor S f V Expectation of V
A SA fA = a‐1 VA = SA/fA e2 + n∙A×B
2+ bn∙A2
B SB fB = b‐1 VB = SB/fB e2 + n∙A×B
2+ an∙B2
A×Binteraction
SA×B fA×B = (a‐1)(b‐1) VA×B = SA×B/fA×B e2 + n∙A×B
2
Repetition Se fe =ab(n‐1) Ve = Se/fe e2
Sum S f = abn‐1
a
i
b
j
n
ki xxS
1 1 1
2
A
a
i
b
j
n
kj xxS
1 1 1
2
B
a
i
b
j
n
kijijk xxS
1 1 1
2e
a
i
n
j
n
kijk xxS
1 1 1
2
a
i
b
j
n
kjiij xxxxS
1 1 1
2
BA
ANOVA table for two‐way layoutFactors are Factor A (number of levels: a), Factor B (number of levels: b) , and repetition (number of levels: n) A
2, B2, A×B
2, ande2 are variances for Factors A and B,
interaction between them, and the repetition.
66
Factor S f V Expectation of V
A SA fA = a‐1 VA = SA/fA e2 + bn∙A
2
B SB fB = b‐1 VB = SB/fB e2 + an∙B
2
Repetition Se fe = abn‐a‐b+1 Ve = Se/fe e2
Sum S f = abn‐1
a
i
b
j
n
ki xxS
1 1 1
2
A
a
i
b
j
n
kj xxS
1 1 1
2
B
a
i
b
j
n
kjiijk xxxxS
1 1 1
2
e
a
i
n
j
n
kijk xxS
1 1 1
2
67
ANOVA table for two‐way layout when neglecting interaction
Factors are Factor A (number of levels: a), Factor B (number of levels: b) , and repetition (number of levels: n) A
2, B2, ande
2 are variances for Factors A and B, and the repetition.
Factor S f V Expectation of V
A SA fA = a‐1 VA = SA/fA e2 + n∙B
2+ bn∙A2
B SB fB = a(b‐1) VB = SB/fB e2 + n∙B
2
Repetition Se fe =ab(n‐1) Ve = Se/fe e2
Sum S f = abn‐1
a
i
b
j
n
ki xxS
1 1 1
2
A
a
i
b
j
n
kiij xxS
1 1 1
2
B
a
i
b
j
n
kijijk xxS
1 1 1
2e
a
i
n
j
n
kijk xxS
1 1 1
2
68
Factors are Factor A (number of levels: a), Factor B (number of levels: b) , and repetition (number of levels: n) A
2, B2, ande
2 are variances for Factors A and B, and the repetition.
ANOVA table for two‐stage nested design
ANSWERS OF QUIZZES
69
A
X
A
Y
B B
X* Y*
70
Is the design of experiment in which operators are crossed but institutes are nested possible?
Quiz ①
The institutes are crossed. The operators are nested to the institute. But, the experiment days are crossed to the institutes.This is neither three‐way layout nor three‐stage nested design.
This can be regarded as a randomized block design, when the effect of the day is redundant information.
71
Quiz ②Is the following design of experiment a three‐way layout, a three‐stage nested design, or something else?
X YA ① ⑤
② ⑥
B ③ ⑦
④ ⑧
Implement the above ① ~ ⑧ in a random order.72
Quiz ③In a two‐way layout, each operator measures two times in a institute.How can we randomize this experiment?
When one operator implements a calibration in one institute in the actual procedure, can we obtain the adequate information with a one‐way layout experiments instead of the two‐way layout below?
A B
X Y
Theoretically, this design will work. But, of course, the larger experimental scale is, the more precise the estimation is.
73
Quiz ④
Quiz ⑤
1011
≈ 0.105 0.12
74
Suppose that there are 100 standard resistances.
10 samples were selected to give the value. The average was 10.0. Based onthe ANOVA, the standard deviations due to sample is given as 0.1 . Thestandard deviations due to repetition is negligibly small.
When selling the residual 90 resistances with the value of 10.0 , how large isthe appropriate uncertainty? Please neglect the other uncertainties than thedifference among the samples.