Post on 20-Dec-2015
transcript
Statistics
in
ScienceStatistics
in
Science
Factors
Complex systems are affected by a wide range of factors:
• Ploughing system: soil type, ploughing depth, no of cultivations, type of plough, etc
• Animal production system: management regime, biological & environmental inputs
• Ecological habitat: available food, cover light, temperature
• Biochemical reaction: concentration of reagents, temperature, light
Statistics
in
ScienceStatistics
in
Science
Factor Levels
Enterprise type is a factor affecting farm outputs
The different enterprise types considered are the levels of the factor:
eg beef, beef suckler, dairy, mixed
Levels may be categorical (as above), or quantitative as in the study of the effect of washing solution on retarding bacterial growth – these were
2%, 4% or 6%of an active ingredient.
With quantitative levels it makes sense to look for a trend (increasing or decreasing) in the response as the level increases.
Statistics
in
ScienceStatistics
in
Science
Single factor experiments• Compare the mean response for the different levels
of a single factor
• Other factors affecting the response must be kept as constant as possible, and any affect of these will appear as random residual variation (due to the random allocation of units to the different levels of the factor)
• The result will be: clear, valid but of limited value
Ex: comparing growth of lambs fed on 2 levels of protein supplement, we must use the same sources of protein for the two levels: we have no info what the response to protein level would be for other sources
Statistics
in
ScienceStatistics
in
Science
(Multi)-factorial experiments
Examine the effect of 2 (or more) factors at the same time
Treatments:the various combinations of the levels of the different factors
Ex:protein supplement: factor B (levels B1, B2, B3)protein source: factor A (levels A1, A2) 6 treatments
A1B1, A1B2, A1B3A2B1, A2B2, A2B3
Statistics
in
ScienceStatistics
in
Science
Simple and Main effects
• Simple effects of source: Difference (in mean growth) between source A1 & A2 can be considered at each of the 3 levels of protein.
• Simple effects of protein: Differences between B1 & B2, B2 & B3 and between B3 & B1 can be measured for each source.
• Main effects are averages of simple effects, and are not always meaningful
Statistics
in
ScienceStatistics
in
Science
Example
Means B1 B2 B3 Average
A1 10 18 11 13
A2 11 19 18 16
A effect 1 1 7 3
Note:the main effect of A, (1 + 1 + 7)/3, is also the difference between the MARGINAL means
Here the effect of A depends on the level of B
This is an INTERACTION between the factors A and B
Statistics
in
ScienceStatistics
in
Science
Important Rule
With an AB interaction:the effect of A changes as the level of B changes
Hence:averaging the effects of A over the levels of B makes no sense
1. The main effect of a factor can not be uncritically interpreted as the effect of the factor if there is an interaction
2. In this case report the ab treatment means and some meaningful comparisons, and not the separate means for levels of A and B
Statistics
in
ScienceStatistics
in
Science
Why do factorial?
1. Factorial experiments compare a set of treatments which have a certain structure:the treatments simply consist of combinations of levels of 2 (or more) factors— so we already know how to do the analysis!— the factorial treatment structure will dictate sensible
comparisons to make
2. The gain:— knowing whether the effect of one factor varies with
the level of another— saving resources when there is no interaction,
since a simple effect can be estimated at each level of the other factor and the results combined
Statistics
in
ScienceStatistics
in
Science
Why the gain (in absence of interaction)Sample size
B1 B2 B3 Total
A1 6 6 6 18
A2 6 6 6 18
Total 12 12 12 36
A effect: since this is the same for all levels of B it is measured by the difference in the marginal means, each based on 18 observations.
B effects: each B effect (B1vB2, B2vB3, B3vB1) is measured using means of 12 observations
Statistics
in
ScienceStatistics
in
Science
Separate experiments (same resources)
A1 A2 Total
9 9 18
A effect: now measured by the difference between means of 9 observations (was 18).
B effects: now measured by the difference between means of 6 observations (was 12).
Also: we don’t know if the A effects depend on the level of B – MORE LOSS OF INFORMATION!
B1 B2 B3 Total
6 6 6 18
Statistics
in
ScienceStatistics
in
Science
PGRM pg 11-6
The enormous benefits (of factorial designs)
arise through no extra cost but merely by
reorganising the work programme.
You can choose to get much more
information for the same money or reduce
the cost of achieving a given level of
information.
Statistics
in
ScienceStatistics
in
Science
SAS OUTPUT
1. ANOVA table
2. Table of MEANS with SED
3. Writing a summary
Statistics
in
ScienceStatistics
in
Science
ANOVA ab factorial, replication r• Treat this as a 1-way structure, with ab treatments
Source SS df MS
Treatments TSS ab - 1 TSS/(ab-1)
Error RSS (r-1)ab RSS/((r-1)ab)
Total rab - 1
• Now partition the treatment SS, TSS
Source SS df MS
A SSA a-1 SSA/(a-1)
B SSB b-1 SSB/(b-1)
AB (interaction) SSAB (a-1)(b-1) SSAB/((a-1)(b-1))
Treatment TSS ab-1
Statistics
in
ScienceStatistics
in
Science
Example: time to development of Fasciola hepatica eggs under 2 combinations of temperature and relative humidity
Temperature oC 16 16 22 22
Humiditiy level 1 2 1 2
27 34 13 17
26 37 17 15
29 33 16 18
Treatment Means 27.3 34.7 15.3 16.7
Source df SS MS F
Treatments 3 758.33 252.78 ***75.83
Partition of TSS
Temp 1 675.00 675.00 ***202.7
Humidity 1 56.33 56.33 **16.92
Interaction 1 27.00 27.00 *8.1
Residual 8 26.67 3.33
Total 11 785.00
p<0.001 ***p<0.01 **p<0.05 *
Statistics
in
ScienceStatistics
in
Science
Tables of MeansTemperature oC 16 16 22 22
Humiditiy level 1 2 1 2
Treatment Means 27.3 34.7 15.3 16.7
SED = 1.49
Humidity effect:sig. when temp = 16 (7.4)non-sig. when temp = 22 (1.4)
Temp. effect:sig. (12.0 & 18.0) at both levels of humidity
16 22 SED
31.0 16.0 1.06
Temperature
Interpretation
Overall treatments differ: F = 75.83
Interaction is significant: F = 8.1, so we really should examine the 4 means as above, and ignore the tests for main effects which eg compare levels of HUMIDITY averaged over levels of TEMP
However, in this case, the TEMP effect is much larger than the interaction, its averaged effect broadly reflects its effect at each level of HUMIDITY
H1 H2 SED
21.3 25.7 1.06
Humidity
Statistics
in
ScienceStatistics
in
Science
Example: time to development of Fasciola hepatica eggs under 2 combinations of temperature and relative humidity
Temperature oC 16 16 22 22
Humiditiy level 1 2 1 2
27 34 13 17
26 37 17 15
29 33 16 18
Treatment Means 27.3 34.7 15.3 16.7
Source df SS MS F
Treatments 3 758.33 252.78 ***75.83
Partition of TSS
Temp 1 675.00 675.00 ***202.7
Humidity 1 56.33 56.33 **16.92
Interaction 1 27.00 27.00 *8.1
Residual 8 26.67 3.33
Total 11 785.00
p<0.001 ***p<0.01 **p<0.05 *
0
10
20
30
40
0 1 2
Humidity
Tim
e to
Dev
elop
men
t
T16
T22
Statistics
in
ScienceStatistics
in
Science
SAS/GLM for 2-way analysis
One-way analysis Main effects & interaction
proc glm data = fasciola;class temp humidity;model time = temp humidity temp*humidity;lsmeans temp;lsmeans humidity;lsmeans temp*humidity;estimate ‘SED for temp’ temp 1 -1;estimate ‘SED for humidity’ humidity 1 -1;
quit;proc glm data = fasciola;
class temp humidity;model time = temp*humidity;estimate ‘SED tment means’ temp*humidity 1 -1;
quit;
Statistics
in
ScienceStatistics
in
Science
SAS demo!
temp humidity time
16 1 27
16 2 34
22 1 13
22 2 17
16 1 26
16 2 37
22 1 17
22 2 15
16 1 29
16 2 33
22 1 16
22 2 18
Data must contain response values (time) in a single column
identified by factor levels in 2 other columns
This gives 3 variables
(columns) for SAS program
faciola.sas
Statistics
in
ScienceStatistics
in
Science
What to present (again!)
• Since the interaction is significant don’t report the main effects.
• Present:– the 2-way table: (with SED)
– a summary:the temp/humidity interaction was significant (p = 0.02)humidity effects were significant at temp = 16 (p = 0.0012)but not at temp = 22 (p = 0.40)temp effects were significant at both humidities (p < 0.0001), and greater when humidity = 1
Time 160 220
1 27.3 15.3
2 34.7 16.7 SED = 1.49
Statistics
in
ScienceStatistics
in
Science
Factorial experiment laid out in blocks
• Above has laid out the ab treatments as a completely randomised design using rab experimental units (r for each treatment)Think: how would this be done in practice?
• If we block the experimental units into blocks of size ab and randomly allocate the ab treatments to the units in the block we can then remove BSS from RSS, hopefully reducing it sufficiently to compensate for the reduction in DF
• See example over …
Statistics
in
ScienceStatistics
in
Science
2-way experiment laid out in blocks
• Factor A: 2 levels Factor B: 3 levels
• 60 experimental units available (10 per treatment)
• Completely randomised design (CR): randomly allocate treatments of unitsRandomised blocks (RB): Group units into blocks of size 6 (so 10 blocks) & randomise the 6 treatments in each block, which may be much easier to do
ANOVA
Source DF: CR DF: RB
Block 9
A 1 1
B 2 2
AB 2 2
Residual 54 45
Total 59 59
Statistics
in
ScienceStatistics
in
Science
Practical: 4.2 Two-Factor Factorial Example 2
Bacterial count in sausagesstored at 4 temperaturesusing 3 type of preservative methods
Statistics
in
ScienceStatistics
in
Science
More than 2 factors!3×4×5 experiment:
ie Factors A, B, C with 3, 4,and 5 levels respectivelygiving 60 treatment combinations!
The 3-factor ABC interaction measures how the 2-factor AB interaction changes over the levels of C(see over)
Can get away with replication r = 1 provided the 3-factor interaction can be assumed negligible– not usually liked by journal editors!
With r > 1 we include:main effects: A, B, C2-factor interactions: BC, CA, AB3-factor interaction: ABC
Statistics
in
ScienceStatistics
in
Science
3-factor interaction for a 2×2×2 expt(a)
0
10
20
30
40
B1 B2 B3
Re
sp
on
se A1C1
A1C2
A2C1
A2C2
With C1: A effect is least at B2
With C2: A effect is largest at B2
Direction of A effect is different for C1, C2
AB interaction different a two C levels
Statistics
in
ScienceStatistics
in
Science
3-factor interaction arising naturally
See PGRM Fig 11.2.2 (b)
Statistics
in
ScienceStatistics
in
Science
Examples – measuring the benefit
1. 2222: artificial insemination involving 256 heifers(r = 16 per treatment)
2. 345: imaginary example to practice
calculating sample sizes! 120 units
(r = 2)
3. 222: machine tool lifetime 24 units(r = 3)
Statistics
in
ScienceStatistics
in
Science
Example 2x2x2x2 factorial
choices
A) 4 experiments (r=32)
B) 2 x 2 x 2 x 2 factorial
(r=16 per combination)
Artificial insemination
256 heifers (64 each week) 4 factors at 2 levels.
Compare precision
A) 32 animals per treatment.
SED = (2 s2/32) = s/4
where s2 = MSE.
B) 128 animals for each level of a factor
SED = (2 s2/128) = s/8.
Plus
With B all interactions can be estimated
Statistics
in
ScienceStatistics
in
Science
Compare precision
A) 32 animals per treatment.
SED = (2 s2/32) = s/4
where S2 = MSE.
B) 128 animals for each level of a factor
SED = (2 s2/128) = s/8.
ConclusionSummary - The factorial design
- Halves the SED and quarters the number of animals required for a given level of precision
- Allows more general interpretation of the factor effects since they are tested over a wide range of levels of the other factors
- Allows a test of whether the factors interact.
Statistics
in
ScienceStatistics
in
Science
3×4×5 expt with factors A, B, C & replication 2(120 units)Replication of Main effect means
A B C
40 30 24
Replication of means in Interaction table, eg BC
B C
1 2 3 4 5 Total
1 6 6 6 6 6 302 6 6 6 6 6 303 6 6 6 6 6 304 6 6 6 6 6 30
Total 24 24 24 24 24 120
All interactions
AB AC BC Treat Comb.
10 8 6 2
For comparing BC effects if only significant interaction is BC
For any factor not involved in a significant interaction
All 2-factor interactions significant, 3-factor not
Statistics
in
ScienceStatistics
in
Science
Example
An engineer is interested in the effects ofcutting speed (A),tool geometry (B) andcutting angle (C)
on thelife (in hours)
of a machine tool.
Two levels of each factor are chosen,and three replicates of a 23 factorial design are run.
Design: 2×2×2
No. treatments: 8
No. units: 24
Statistics
in
ScienceStatistics
in
Science
Example: DataA B C LIFE(hr)
Replicate
1 2 3
1 1 1 22 31 25
2 1 1 32 43 29
1 2 1 35 34 50
2 2 1 55 47 46
1 1 2 44 45 38
2 1 2 40 37 36
1 2 2 60 50 54
2 2 2 39 41 47
Statistics
in
ScienceStatistics
in
Science
Example: ANOVASource df SS MS F F pr.
A 1 0.7 0.7 0.02 0.884B 1 770.7 770.7 25.55 <.001C 1 280.2 280.2 9.29 0.008
A.B 1 16.7 16.7 0.55 0.468A.C 1 468.2 468.2 15.52 0.001B.C 1 48.2 48.2 1.60 0.224
A.B.C 1 28.2 28.2 0.93 0.348Residual 16 482.7 6.2
Total 23 2095.3
Note:
1. ABC interaction non-significant
2. AC is only significant 2-factor interaction
Statistics
in
ScienceStatistics
in
Science
Tables of MEANS1 2 SED
A 40.7 41.0 2.24B 35.2 46.5 2.24C 37.4 44.2 2.24
AB1 B2 SED
1 34.2 47.22 36.2 45.8 3.17
AC1 C2 SED
1 32.8 48.52 42.0 40.0 3.17
BC1 C2 SED
1 6.3 40.02 44.5 48.5 3.17
B1 B1 B2 B2
C1 C2 C1 C2
A1 26.0 42.3 39.7 54.7
A2 34.7 37.7 49.3 42.3
SED = 4.48
Help!
Statistics
in
ScienceStatistics
in
Science
Making sense of tables
1. From this analysis, the only terms that are significant are the B and C main effects and the AC interaction.
2. Thus, the only tables that need to be presented are the B main effect table and the AC tables of means.– Geometry (B) has a large effect, increasing the life by
over 10 hours.– Cutting angle (C) increases the life considerably at low
but not at high speed (A).
3. Another way of looking at the AC interaction is that increased speed increase tool life for the first cutting angle but reduces it for the second cutting angle.
Statistics
in
ScienceStatistics
in
Science
SAS/GLM codeproc glm data = mydataset;
model response = a b c b*c c*a a*b a*b*c;
lsmeans a b c b*c c*a a*b a*b*c;
quit;
With one (AC) significant interaction
lsmeans b a*c / stderr;estimate ‘b SED’ b 1 -1;
/* ac SED = sqrt(2) x stderr */
Is this the
best we can
suggest?
Statistics
in
ScienceStatistics
in
Science
Calculating SEDsRecall (with equal replication):
SED = √2 × SEM
SED: standard error of a difference
SEM: standard error of a mean
SAS:
lsmeans B / stderr;
lsmeans A*C / stderr;
lsmeans A*B*C / stderr;
will give SEM, & a usually useless p-value testing whether the mean is 0!
f3_toolLife.sas
Statistics
in
ScienceStatistics
in
Science
Calculating SED:
For the AC interaction:
SEM = 2.2422707NB: usual SAS unhelpful precision!
so SED = 1.414 × 2.2422707
= 3.17 (3 sig. figs.)
Statistics
in
ScienceStatistics
in
Science
Interpreting the log scaleLinear relationship
log(y) = a + bx (here: log = log2)
y = 2a + bx = 2a 2bx
Compare y-values for a unit increase in x,
ie y1 at x and y2 at x + 1
y2 / y1 = [2a 2(bx + b)]/ [2a 2bx]
= 2b
Increasing x by 1, multiplies y by 2b
eg if b = -1 this is a 50% decrease in y
Statistics
in
ScienceStatistics
in
Science
Understanding the LOG scale
- where effects of a variate are proportional
Example:
1. uses log2 (logs to base 2)
2. slope b = -1
- giving a 50% decrease per unit increase in x
Statistics
in
ScienceStatistics
in
Science
log2(y) = 3 – x
- a linear relationship between log2(y) & x
x y
0 8
1 4
2 2
3 1
4 0.5
Statistics
in
ScienceStatistics
in
Science
Back transforming LOG
y = log10(x) x = 10y
y = log2(x) x = 2y
y = log(x)x = exp(y) = ey
Statistics
in
ScienceStatistics
in
Science
Dilution of drug in milkExcretion of sodium penicillin for five milkings for a cow.
Relationship is not linear.
Units vs Milkings
0
10000
20000
30000
0 2 4 6
Milkings
Uni
ts
Milking Units Excreted
Log(Units)
1 29547 10.29 2 1111 7.01 3 235 5.46 4 26 3.26 5 4.3 1.46
Statistics
in
ScienceStatistics
in
Science
LOG-scale
Slope b= -2.14
exp(-2.14) = 0.12
Conclusion:
Each milking reduces the# units to 12% of previous milking
Log(units) vs # milkings
log(U) =11.9 - 2.14 M
0
5
10
15
1 2 3 4 5
Milking
Lo
g(u
nit
s)
Statistics
in
Science
Statistics
in
Science
Statistics
in
Science
Statistics
in
Science
Revision:
t-test, p-value, significance level, hypothesis testing, and much more
ALL IN ONE OVERHEAD!