+ All Categories
Home > Documents > 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 –...

1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 –...

Date post: 28-Mar-2015
Category:
Upload: gavin-jenkins
View: 214 times
Download: 2 times
Share this document with a friend
Popular Tags:
97
1 This afternoon’s programme • 2.05 – 3.00 A short talk. • 3.00 – 3.20 A break for coffee. • 3.20 – 4.30 Running tests with PASW Statistics 17.
Transcript
Page 1: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

1

This afternoon’s programme

• 2.05 – 3.00 A short talk.

• 3.00 – 3.20 A break for coffee.

• 3.20 – 4.30 Running tests with PASW Statistics 17.

Page 2: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

2

SESSION 2

Further topics in the analysis of variance

Page 3: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

3

Only the starting-point

• In Monday’s session, I revised the one-way ANOVA.

• We saw that merely obtaining a significant F and therefore rejecting the null hypothesis of equality of the means leaves many questions unanswered.

• Therefore, the ANOVA is just the first step in the complete analysis of a set of data.

Page 4: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

4

Does ‘significant’ mean ‘substantial’?

• The F test produced a significant result. • The null hypothesis of equality of the five treatment

means must be rejected. • With large numbers of observations, however, a

statistical test can have too much POWER to reject the null hypothesis. Even tiny differences among the means will result in a significant F, with a miniscule p-value.

• Modern Internet studies can yield millions of observations, so the possibility of having too much data is no longer remote.

• ‘Significant’ does not necessarily mean ‘substantial’.

Page 5: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

5

Measuring effect size

Page 6: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

6

Breakdown (or partition) of the total sum of squares

Page 7: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

7

Eta squared

• The oldest measure of effect size is suggested by the partition of the total sum of squares.

• In this measure, the between groups sum of squares is expressed as a PROPORTION of the total sum of squares.

Page 8: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

8

Eta squared (where eta is the CORRELATION RATIO)

Page 9: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

9

Range of eta squared

• Theoretically, eta squared can take values between zero (no differences among the means) and unity (the scores in any group all have the same value).

• In practice, its values will always lie somewhere between these limits.

Page 10: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

10

Why is eta called the ‘correlation ratio’?

• Suppose that opposite each of the 50 scores in the one-way drug experiment, we were to place its group mean.

• The correlation between the column of scores and the column of means gives the value of eta.

• Let’s demonstrate this.

Page 11: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

11

The Aggregate procedure

• In SPSS/PASW, the Aggregate procedure places opposite each score a value (such as the mean – but other statistics can be chosen) which summarises the scores in the group.

• The grouping variable (in this case Drug Condition) is specified as the BREAK VARIABLE.

• The participant’s score (the DV) is the VARIABLE TO BE SUMMARISED.

Page 12: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

12

Finding the Aggregate command

Page 13: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

13

The Aggregate dialog

Page 14: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

14

The means have been added

• The default name Score_Mean has been changed to Group_Mean.

• Such changes to default names can easily be made in Variable View.

Page 15: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

15

Now correlate the means with the scores …

Page 16: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

16

The Bivariate Correlations dialog

Page 17: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

17

We obtain the value of eta

Page 18: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

18

Eta squared again

Page 19: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

19

Eta is the correlation between the scores and their group means.

The square of the correlation between the scores and the group

means is eta squared.

Page 20: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

20

In the population …

Page 21: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

21

Other measures of effect size

• Several other measures of effect size have been proposed.

• One of these, Cohen’s f, is used as input for G*Power, a useful package for answering questions about the numbers of participants you would need in a planned study to achieve sufficient power in your statistical tests.

Page 22: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

22

Cohen’s f

Page 23: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

23

Error variance as a proportion

• If eta squared is the proportion of the total variance that is between groups,

Page 24: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

24

Cohen’s f and eta squared

• So if we take the square root of eta squared divided by (1 – eta squared), we have Cohen’s f :

Page 25: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

25

In our example,

Page 26: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

26

Positive bias

• Eta squared is positively biased as an estimate of effect size.

• Were the experiment to be repeated many times, the long run average or EXPECTED VALUE of eta squared would be higher than the population value.

Page 27: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

27

Omega squared

• For some ANOVA designs, there is available a statistic called omega squared, which corrects for positive bias.

• But the application of omega squared is problematic in complex designs with repeated measures factors.

• PASW Statistics 17 does not offer omega squared.

Page 28: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

28

Interpreting values of eta squaredand Cohen’s f

Page 29: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

29

Multiple comparisons

• When there are three or more groups, the rejection of the null hypothesis leaves many important questions unanswered.

• Is the mean for the Placebo group significantly different from that of the Drug D group? Is it significantly different from the Drug C group?

Page 30: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

30

Planned contrasts

• On Monday, I discussed the making of specific PLANNED comparisons, simple and complex, among the individual treatment means.

Page 31: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

31

Unplanned or ‘post hoc’ tests

• Often, however,after we have the results of an experiment, we shall want to do some DATA-SNOOPING – i.e., run unplanned statistical tests of differences among the individual treatment means.

• Such unplanned tests are known, (solecistically) as POST HOC tests.

• The following points apply both to planned and unplanned comparisons.

Page 32: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

32

Statistical testing again.

Page 33: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

33

The critical region

• A small arbitrary probability (usually .05) known as the SIGNIFICANCE LEVEL is fixed in advance.

• The CRITICAL REGION is a range of values such that, assuming that the null hypothesis is true, the probability that t will fall inside the range is less than or equal to the fixed significance level.

Page 34: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

34

Critical region of t distribution

Page 35: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

35

Type I errors

• If the sigificance level α is set at 0.05, any p-value less than 0.05 will result in the rejection of the null hypothesis (H0).

• If H0 is true, it will be wrongly rejected on 5% of occasions with repeated sampling. A false rejection of the null hypothesis is known as a ‘Type I’ error, and the significance level is therefore also known as the ‘Type I’ or ‘alpha’ error rate.

Page 36: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

36

Type I error rate per comparison

• Suppose we gather our data and test the differences among pairs of means for significance.

• We make several of these tests, setting the significance level at .05 each time and rejecting the null hypothesis whenever the p-value is less than .05 .

• This significance level (.05) is known as the Type I error rate PER COMPARISON.

Page 37: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

37

An array of ten treatment means

• Suppose, to make my next point more strongly, I have an array of ten treatment means and want to make comparisons between, say, the Placebo mean and each of nine drug means.

• I set the type I error rate per comparison at .05. • Suppose the null hypothesis is true – in the

population, the means all have the same value. • In the population, the profile is a pancake.

Page 38: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

38

The type I error rate per family

• What is the probability that AT LEAST ONE test will show significance, even if the null hypothesis is true?

• This is known as the PER FAMILY or FAMILYWISE type I error rate.

• The older term EXPERIMENTWISE has a similar meaning.

Page 39: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

39

The familywise type I error rate is unacceptably high!

• Were I to make 9 INDEPENDENT tests of differences among the 10 means, the familywise type I error rate

would be nearly .4 – FORTY PER CENT! • (The probability of at least one type I error is 1 minus the

probability of none.)

Page 40: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

40

Capitalising upon chance

• With a large array of treatment means, we might decide to make a large number of comparisons.

• Even if the null hypothesis is true, the familywise Type I error rate might be 0.90 or even higher!

• Failure to take the heightened probability of the familywise Type I error into account when making sets of comparisons is known as CAPITALISING UPON CHANCE.

Page 41: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

41

Familywise type I error rate with unplanned pairwise comparisons

• Suppose we decide to make every possible pairwise comparison. Assume, for simplicity, that the comparisons are independent.

• The number of possible pairings from ten means is 45.

• If the per comparison error rate is fixed at .05, the

familywise type I error rate is in the region of

Page 42: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

42

Conservative tests

• A CONSERVATIVE TEST adjusts the p-value per comparison upwards in order to to control the familywise Type I error rate.

• This is equivalent to setting the per comparison significance level at a lower value than the traditional significance level.

• There are many different approaches to the making of conservative tests to avoid capitalising upon chance.

Page 43: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

43

The Bonferroni correction

• The Bonferroni correction was originally applied to PLANNED comparisons.

• You plan to make k contrasts among a set of means.

• You want to keep the per family Type I error rate at the .05 level approximately.

• You achieve this by multiplying the obtained p-value for each value of t by k.

Page 44: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

44

Example

• Returning to our drug experiment, we planned to make four simple contrasts.

• We want to control the familywise Type I error rate and keep it at the .05 level.

Page 45: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

45

Results of the four simple contrasts

• Double-click to get into the editor and use Cell properties to display more places of decimals for the second contrast.

Page 46: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

46

Applying the Bonferroni correction

• Multiply the given p-value by 4, the number of planned simple contrasts.

• For the second contrast, the corrected p-value is .024 . • Report the Bonferroni-corrected p-values rather than the

values given in the table. • Write that the given p-values have been Bonferroni-

corrected: For the second contrast (upper half of the table), write:

• “t(45) = 2.88; p = .024 (Bonferroni-corrected)”.

Page 47: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

47

Bonferroni correction for post hoc comparisons

• You must assume that ALL POSSIBLE pairwise comparisons will be made.

• If you have ten treatment means, there are 45 possible pairs.

• So the p-value for each test must be multiplied by 45.

• Equivalently, the per comparison significance level must be set at .001 .

• That’s a tough criterion for significance.

Page 48: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

48

Selection of post hoc tests

Page 49: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

49

Which one?• The Bonferroni is the most conservative of these tests.

With a large array of means it’s almost impossible to get anything significant.

• For between subjects experiments, the Tukey test is preferred. (The Tukey B test is less conservative.)

• The LSD (least significant difference) test makes no correction; but the test is made only if the ANOVA F value is significant.

• The Dunnet is the most powerful conservative test, but it is suitable only for the situation where you are comparing the mean of the controls with each of the other treatment means, that is, when you are making simple comparisons.

• The Scheffe test is good for unplanned complex comparisons.

Page 50: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

50

Factorial experiments

Page 51: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

51

Factorial experiments

• In a FACTORIAL experiment, there are two or more treatment factors.

• The ANOVA really comes into its own when it is applied to the analysis of data from factorial experiments.

Page 52: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

52

Types of ANOVA design

• The three most common types of factorial ANOVA design are:

1. BETWEEN SUBJECTS FACTORIAL designs, in which ALL factors are between subjects.

2. WITHIN SUBJECTS FACTORIAL designs, in which ALL factors are within subjects.

3. MIXED FACTORIAL designs, in which SOME factors are between subjects and some are within subjects.

Page 53: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

53

A two-factor between subjects factorial experiment

• Suppose that a researcher has been commissioned to investigate the effects upon simulated driving performance of two new anti-hay fever drugs, A and B. It is suspected that at least one of the drugs may have different effects upon fresh and tired drivers, and the firm developing the drugs needs to ensure that neither drug has an adverse effect upon driving performance in any circumstances.

• The researcher decides to carry out a two-factor between subjects factorial experiment, in which the factors are:

• Drug Treatment, with levels Placebo, Drug A and Drug B;

• Alertness, with levels Fresh and Tired.

Page 54: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

54

The experimental design

Page 55: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

55

Main effects and interactions

• A factor is said to have a MAIN EFFECT if, in the population, there are differences among the means at its different levels, ignoring any other factors in the design.

• A main effect is indicated by differences among the MARGINAL means.

• In factorial experiments, interest usually centres not on main effects, but on the interplay among the treatment factors, that is, upon INTERACTIONS.

Page 56: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

56

Some terms for a two-way table of means

Page 57: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

57

Observations • Main effects are evident in the MARGINAL TOTALS. • Not surprisingly the Fresh participants outperformed the Tired

participants. Looks as if the Alertness factor has a main effect. • Performance was higher in the Drug B group, suggesting a main

effect of the Drug Treatment factor as well. • But the CELL means are the main focus of interest, because certain

patterns in those indicate the presence of an INTERACTION.

Page 58: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

58

Profile plots

• You find that the F test for an interaction is significant. What does this mean?

• The next step is to examine the appropriate profile plots.

• More than one plot is possible: your choice depends upon which factor is of principal interest.

Page 59: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

59

Two Alertness profiles

Page 60: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

60

Three drug profiles

Page 61: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

61

Nonparallel profiles

• In neither plot are the profiles parallel. • In the first profile, the factor of Alertness seems

to reverse its effect at different levels of the Drug factor. In fact, Drug A actually depressed the performance of the Fresh participants.

• In the second profile, the ordering of the means at the three levels of the Drug factor changes from level to level of the Alertness factor.

• Nonparallelism of the profiles indicates the presence of an interaction.

Page 62: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

62

Simple main effects

• A factor is said to have a SIMPLE MAIN EFFECT when there are differences among its means at a specific level of another factor.

• In the first profile plot, the Alertness factor would seem to have simple main effects at all three levels of the Drug factor.

Page 63: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

63

Interaction

• A two-factor INTERATION between two factors is said to occur when the simple main effects of one factor are not homogeneous across all levels of the other factor.

• The simple main effects of the Alertness factor are not the same across all levels of the Drug factor.

• It would appear that a two-factor interaction may be present in these data.

Page 64: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

64

Partition of the total sum of squares in the two-way ANOVA

• A and B are the two treatment factors, and AB is their interaction.

Page 65: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

65

Three F tests

Page 66: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

66

Two-way ANOVA summary table

Page 67: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

67

Measuring effect size in factorial experiments

Page 68: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

68

Effect size in factorial experiments

• A controversial area. • The measure known as COMPLETE ETA

SQUARED expresses the contribution of a source (whether a main effect or an interaction) to the total variance in the presence of all other treatment or group sources.

• The measure known as PARTIAL ETA SQUARED excludes all other treatment or group sources.

Page 69: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

69

Complete eta squared

• Expresses the variance attributable to a source in terms of the TOTAL variance.

Page 70: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

70

Example of calculation of complete eta squred

Page 71: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

71

Partial eta squared

• Expresses the variance of a source as a proportion of the source variance, plus error. The other sources are omitted.

Page 72: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

72

Partial eta squared for Alertness

• The value of partial eta squared (.139) is greater than that of complete eta squared (.08).

Page 73: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

73

Coffee break

Page 74: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

74

When the interaction is significant

• We shall often want to ‘unpack’ a significant interaction by testing for simple main effects and making multiple comparisons among the individual treatment means.

Page 75: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

75

Simple effects with PASW

• Simple effects are not an option in the ANOVA dialog windows.

• It is easy to run simple effects on PASW, but we must use SYNTAX to achieve this.

• A small problem is that we must use, not the ANOVA syntax command, but the command for what is known as Multivariate Analysis of Variance or MANOVA.

• But it really is VERY EASY to do this.

Page 76: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

76

Multivariate analysis of variance (MANOVA)

• In the ANOVA, there is just ONE dependent variable.

• Multivariate Analysis of Variance (MANOVA) is a generalisation of the ANOVA to the analysis of data from experiments of ANOVA design with two or more DVs.

• We can, therefore, regard the ANOVA as a special case of the MANOVA.

Page 77: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

77

Using MANOVA to run ANOVA

• If there is only one DV, running the MANOVA procedure will run a univariate ANOVA and produce the usual ANOVA summary table.

• Why bother? Tests for simple effects are options in the MANOVA command.

• They are not available in other PASW commands.

Page 78: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

78

Open the data set

Page 79: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

79

Get into the Syntax Editor

Page 80: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

80

The PASW syntax editor

Page 81: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

81

The basic MANOVA command

• This time, you will have to do some writing.

Page 82: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

82

Write in the MANOVA command

Page 83: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

83

Check the active dataset

Page 84: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

84

The two-way ANOVA summary table

Page 85: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

85

The /ERROR and /DESIGN subcommands for simple

effects of Drug • You will need two subcommands:

/ERROR and /DESIGN.

Page 86: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

86

Output for simple effects analysis

Page 87: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

87

The need for multiple comparisons

Page 88: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

88

The need for a smaller comparison ‘family’

• An interaction is significant. • We want to make unplanned or post hoc multiple

comparisons among the treatment means. • But there may be many cells in the design, so that the

critical difference for significance may be impossibly large.

• In terms of the Bonferroni test, you could be multiplying the p-value by a large factor or setting the per comparison significance level at a tiny value.

• We need to justify making the comparisons among a smaller array of means.

Page 89: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

89

First, we test for simple main effects

• We might argue that if we have a significant main effect of the Drug factor at one level of Body State or Alertness, we can define the comparison family in relation to those means at the Fresh level of Body State only. This will produce a less conservative test.

• When testing for simple main effects, however, we should use the Bonferroni correction to control the familywise Type I error rate.

• In our example, since there are two simple main effects, the criterion for significance should be that p is less than 0.025, rather than 0.05.

Page 90: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

90

Reduce the data set.

• There is more than one way of making the multiple comparisons.

• You can easily run a one-way ANOVA on the data from the scores of the fresh participants only, then ask for a Tukey test.

Page 91: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

91

Select cases

Page 92: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

92

Select the data from the Fresh participants only

Page 93: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

93

Choose Tukey multiple comparisons

Page 94: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

94

The results

Page 95: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

95

Summary

• A report of an ANOVA F test should be accompanied by a measure of effect size, such as eta squared, omega squared or Cohen’s f. Follow Lisa DeBruine’s guidelines.

• Beware of capitalising upon chance: follow-up tests should be conservative.

• When unpacking significant interactions, use syntax to test for simple main effects.

• The obtaining of a significant simple main effect can be an argument for a smaller comparison ‘family’.

Page 96: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

96

Further reading

• For a thorough and readable coverage of elementary (and not so elementary) statistics, I recommend …

• Howell, D. C. (2007). Statistical methods for psychology (6th ed.). Belmont, CA: Thomson/Wadsworth.

Page 97: 1 This afternoons programme 2.05 – 3.00 A short talk. 3.00 – 3.20 A break for coffee. 3.20 – 4.30 Running tests with PASW Statistics 17.

97

For PASW/SPSS

• Kinnear, P. R., & Gray, C. D. (2009). PASW 17 for Windows Made Simple. Hove and New York: Psychology Press.

• In addition to practical advice about using PASW Statistics 17, we also offer informal explanations of many of the techniques.


Recommended