1 Psych 5500/6500 Comparisons Among Treatment Means in Single- Factor Experiments Fall, 2008.

1

Psych 5500/6500

Comparisons Among Treatment Means in Single-Factor Experiments

Fall, 2008

2

Example Let’s go back to the example from the previous

lecture. We have an experiment where the independent variable is ‘Type of Therapy’ (Control Group, Behavior Modification, Psychoanalysis , Client-Centered, Gestalt) and the dependent variable is level of depression after 2 months.

H0: μCG= μBM= μPA= μCC= μG

HA: at least one μ is different than the rest.

3

DataControl Beh. Mod. Psycho-

analysisClient-Centered

Gestalt

9 6 6 6 3

8 6 7 5 1

7 4 6 7 5

8Y1 33.5Y2 33.6Y3 6Y4 3Y5

4

Summary Table

Source SS df MS F p

Between 39.60 4 9.90 6.46 .008

Within 15.33 10 1.53

Total 54.93 14

F(4,10)=6.46, p=.008 We can conclude that at least one μ isdifferent than the rest. We will refer to this as the ‘overall’ or‘omnibus’ F.

5

ComparisonsComparison procedures allow us to ask much more

specific questions. There are many different comparison procedures, but they all involve comparing just two things with each other. This can be accomplished several ways, for a specific comparison you can:

1. Drop all but two groups.2. Add groups together to get to two groups3. Drop some groups and add others together (to end up

with two groups).An important restriction is that no group can appear on

both sides of a comparison.

6

ExamplesDropping all but two groups. In this particular example we

will compare each therapy group with the control group giving us 4 different comparisons (note other pair-wise comparisons could also be made, e.g. BM vs. Gestalt)

1. Control group vs. Behavior ModH0: μCG = μBM

2. Control group vs. PsychoanalysisH0: μCG = μPA

3. Control group vs. Client-CenteredH0: μCG = μCC

4. Control group vs. GestaltH0: μCG = μG

7

ExampleControl Group vs Behavior Modification

Control Group Behavior Modification

9 6

8 6

7 4

Mean=8 Mean=5.33

These are the means being compared, but as we shall see the analysis uses the within-group variance of all the groups.

8

Adding groups together

5. Control group vs. Therapy (i.e. the Control group vs. all of the therapy groups combined into one large ‘therapy’ group).

4 :H0 CG

GCCPABM

9

ExampleControl Group Therapy

9 6

8 6

7 4

6

7

6

6

5

7

3

1

5

Mean = 8 Mean = 5.17

10

Dropping some groups and adding other groups together

6. Behavior Modification versus ‘Talking Therapies’ (i.e. drop the control group and compare the Behavior Mod group to a group consisting of Psychoanalysis, Client-Centered, and Gestalt combined)

3 :H0 BM

GCCPA

11

Computing the Comparison

With a comparison we always end up with two groups, we could simply due a t test for those two groups. Comparison procedures, however, do something that has more power, they:

1. Recompute MSBetween using the two groups of the comparison.

2. Borrow the MSWithin from the overall F test.

3. F= MSBetween / MSWithin 4. If you want to do a directional (one-tailed)

hypothesis, then adjust the p value accordingly.

12

Computing the Comparison

This comparison procedure has more power than just doing a t test on the two groups of the comparison.

1. The df for MSWithin comes from all of the groups, even if the comparison involves less than all of the groups.

2. When you combine groups together that can cause more within group variance that would hurt the power of the t test.

13

Control Group Therapy

9 6

8 6

7 4

16

17

16

2

3

1

I’ve changed the data here to make a point. If this was analyzed usingthe t test than the variance within the therapy group (a combination ofthe data from three therapies) would be large and would hurt thepower. But with the comparison technique MSwithin only looks atthe variance within each of the three therapy groups & within thecontrol group.

14

Number of Possible ComparisonsAs the number of groups in the experiment grows, the

number of possible comparisons gets very large. For example, in an experiment with 4 groups (‘A’, ‘B’, ‘C’, and ‘D’) some of the comparisons would be:

A vs B, A vs C, A vs D, B vs C, B vs D, C vs D, A vs (B+C), A vs (B+D), A vs (C+D), B vs (A+D), B vs (C+A). B vs (C+D), C vs (A+B), C vs (A+D), C vs (A+D)...etc...(A+B) vs (C+D), (A+C) vs (B+D)...etc....A vs (B+C+D), B vs (A+C+D), C vs (A+B+D), D vs (A+B+C).

15

Error RateA concern is that if H0 is true, and we make lots

of comparisons, and each has a .05 chance of making a type 1 error (rejection H0 when H0 is true), then the chances of making at least one type 1 error becomes quite large.

The (rather large and complicated) topic of ‘comparisons’ provides a variety of procedures for keeping the probability of making a type 1 under control when making many comparisons.

16

Definitions• Error rate per comparison:

p(making a Type 1 error when performing any one specific comparison|H0 true for that comparison)

• Error rate per comparison set:p(making at least one Type 1 error when you perform a

set of comparisons|H0 true for all of them)• Error rate per experiment:

If you make several sets of comparisons when analyzing the data from an experiment, then this would be: p(making at least one Type 1 error in the analysis of the data from the experiment|H0 true for all of them)

17

ExampleWe are going to make two, independent,

comparisons (and the null hypothesis happens to be true for both). Each has a .05 chance of making a Type 1 error (i.e. α=.05 per comparison). Thus the error rate per comparison = .05.

Now lets calculate the error rate for that set of two comparisons...

18

Remember, you can only make a Type 1 error when H0 is true, we will assume H0 is true for this example:

We will let ‘C’ stand for making a correct decision to not reject H0.p(C)=.95

We will let ‘E’ stand for incorrectly rejecting H0, thus making a Type 1 error. p(E)=.05

19

Possible OutcomesPossible outcomes

Determining probability

Probability of that outcome

C,C(2 correct decisions)

(.95)(.95) .9025

C,E(1st correct, 2nd an error)

(.95)(.05) .0475

E,C(1st an error, 2nd correct)

(.05)(.95) .0475

E,E(2 errors)

(.05)(.05) .0025

The probability of making at least one Type 1 error in twocomparisons = .0475+.0475+.0025=.0975. Thus theerror rate per comparison set =.0975

20

Formula for Error Rate Per Comparison Set

If: αPC= error rate per comparison, and

αPCS= error rate per comparison set, and

k=the number of independent comparisons that will be made,

Then:αPCS=1 – (1- αPC)K

21

Examples

If error rate per comparison =.05, and you do three comparisons, then error rate for that comparison set αPCS=1 – (1- .05PC)3 =.143

With ten comparisons, the error rate per comparison set = .40 (a 40% chance of making at least one Type 1 error!)

22

Non-independent Comparisons

There is no simple formula for computing error rate per comparison set when the comparisons are not independent. We will get to the difference between independent and dependent comparisons in a second. First I would like to review the various concerns that arise when making several comparisons.

23

Error Rate Concern #1

If H0 is true:

1. The more comparisons you make the more likely it is you’ll make a Type 1 error. This is the point we just covered.

24

2. One improbable mean (one that differs from the others quite a bit just due to chance) can lead to many wrong decisions to reject H0. Every comparison involving that mean could lead to the decision to reject H0.


25

3. If you have several groups in your experiment, a comparison of the lowest mean and highest mean is likely to lead to a rejection of H0.

The overall F test is not fooled by this, it knows how many groups are in the experiment and looks at how much they all differ from each other. If you were, however, to drop all of the groups except the one with the highest mean and the one with the lowest mean, and do a t test on those two groups, you will probably be able to reject H0 even when the independent variable had no effect.


26

Controlling Error Rate

There are many, many, procedures for controlling error rate when making multiple comparisons, they all take a different approach to the problem. We will look at a few procedures that cover the basic concepts. To differentiate among those procedures we need to determine if the comparisons are a priori or a posteriori, and if they are orthogonal (independent) or non-orthogonal (non-independent).

27

a priori and a posteriori

a priori comparisons (also known as ‘planned comparisons’) are those you know you are going to want to do before you gather your data. They are based upon theory, specifically, upon which comparisons will shed light on the hypotheses you are testing.

a posteriori comparisons (also known as ‘post hoc’ or ‘data snooping’ comparisons) are those you run to examine unexpected patterns in your data, or just to snoop out what else your data might tell you.

28

Orthogonality

Conceptual: Comparisons are orthogonal when they lead to analyses that are not redundant. For example, if one comparison allows us to compare group 1 with group 2, that analysis will not in any way predict the results of a comparison that allows us to compare group 3 with group 4. Those two comparisons are ‘orthogonal’ (i.e. non-redundant).

29

Sets of Orthogonal Comparisons

If ‘a’ is the number of groups in your experiment, it is possible to come up with a set of a-1 comparisons that are all orthogonal to each other.

To determine whether or not comparisons are orthogonal to each other we need to express them as ‘contrast codes’

30

Contrast CodesContrast codes are sets of constants that add up to zero and that

indicate which groups are involved in a comparison (and how much to weight each one).

CG BM PA CC G

Comparison 1 1 -1 0 0 0

Comparison 2 0 0 1 -1/2 -1/2

Comparison 3 1/2 1/2 0 -1/2 -1/2

Comparison 1: CG vs. BM Comparison 2: PA vs. (CC+G)Comparison 3: (CG+BM) vs. (CC+G)

31

Testing OrthogonalityTwo comparisons are orthogonal if the sum of their

products (as demonstrated below) equals zero.

CG BM PA CC G


Comparison 2 0 0 1 -1/2 -1/2

1 x 2 0 0 0 0 0

The sum of the products = 0+0+0+0+0=0 so these twocomparisons are orthogonal

32

Testing OrthogonalityNow let’s see if comparisons 1 and 3 are orthogonal..

CG BM PA CC G


Comparison 3 1/2 1/2 0 -1/2 -1/2

1 x 3 1/2 -1/2 0 0 0

The sum of the products = 1/2+-1/2+0+0+0=0 so these twocomparisons are orthogonal

33

Testing OrthogonalityNow let’s see if comparisons 2 and 3 are orthogonal..

CG BM PA CC G

Comparison 2 0 0 1 -1/2 -1/2

Comparison 3 1/2 1/2 0 -1/2 -1/2

2 x 3 0 0 0 1/4 1/4

The sum of the products = 1/4+1/4+0+0+00 so these twocomparisons are not orthogonal.

34

Comparisons 2 and 3 are not orthogonal, so this was not a set of orthogonal comparisons (they all need to be orthogonal to each other, tested by looking at each pair of contrasts). However, as there are five groups in the experiment we should be able to come up with a set of four comparisons that are all orthogonal to each other (pairwise). Actually we can come up with many different sets of four orthogonal comparisons, but we can’t come up with five comparisons that are orthogonal to each other if we have five groups in the experiment.

35

Examples of Orthogonal Sets

CG BM PA CC G


Comparison 2 1/2 1/2 -1 0 0

Comparison 3 1/3 1/3 1/3 -1 0

Comparison 4 1/4 1/4 1/4 1/4 -1

Check them out, they are all orthogonal to each other (tested as pairs). Note the general pattern here; compare two groups, combine those two groups and compare them to a third group, combine those three groups and compare them to a fourth, and so on.

36

Another Example

CG BM PA CC G


Comparison 2 0 0 1 -1 0

Comparison 3 1/2 1/2 -1/2 -1/2 0

Comparison 4 1/4 1/4 1/4 1/4 -1

Check them out, this set of comparisons is also orthogonal (pairwiseto each other). Note the pattern: compare 2 groups, compare 2 other groups, compare the first pair with the second, combine them all and compare them to yet another group. There are many possible sets of orthogonal comparisons, but unless the same comparison happens to appear in both sets (like comparison 4 here), the comparisons in this set will not be orthogonal to the comparisons in a different set.

37

Orthogonal Sets

There are various patterns of orthogonality. After a while you get the hang of creating them.

Remember, the maximum number of comparisons that can all be orthogonal to each other is a-1.

Before going on to the next slide, go back and reread the material on error rate, error rate per comparison and error rate per comparison set, and the three factors the lead to high error rates.

38

Error Rate Concerns (revisited)

Any method to control error rate has to address these three concerns:

1. The error rate per comparison set goes up as the number of comparisons goes up.

2. One weird mean might lead to a lot of mistaken rejections of H0.

3. In an experiment with several groups, a comparison of the largest and smallest means is likely to be significant.

39

Comparison Procedures

We will be looking at three different comparison procedures, they each address the error rate concern in a different fashion:

1. A priori, orthogonal comparisons2. Dunn’s method for a priori, non-orthogonal

comparisons (also called the Bonferroni method)

3. Scheffe’s method for a posteriori comparison.

40

1) A priori, orthogonal comparisons

To perform comparisons on a set of a priori, orthogonal comparisons:

1. You do not have to first reject H0 on the overall F test before doing these comparisons (as we will see you do need to reject H0 on the overall F test before doing a post hoc comparison)

2. Set the error rate per comparison at your normal significance level (i.e. .05) , which will make the error rate per comparison set greater than .05

3. You can transform the p value of the comparison to test a directional hypothesis.

41

How Error Rate is Controlled

1. The number of comparisons is limited to a-1 by the requirement that they all be orthogonal to each other.

2. One weird mean is not likely to lead to lots of Type 1 errors, as the comparisons are orthogonal and thus aren’t redundant in the questions being asked.

3. As the comparisons are decided upon a priori you can’t wait until after you see the means and then select the biggest and smallest for your comparison.

42

General Strategy for Controlling Error Rate

The other comparison procedures keep error rate per comparison set from getting too large by using a more stringent error rate per comparison.

In other words, they keep you from falsely rejecting H0 over many comparisons by making it harder to reject H0 for each comparison.

43

Dunn’s Method for a priori, Non-Orthogonal Comparisons*

1. You do not have to first reject H0 on the overall F test.

2. Set your significance level for each comparison at α/(# of comparisons). See the next slide for more details.

3. You can transform the F of the comparison to a t test to test a directional hypothesis, or simply adjust the p value

* Also known as Bonferroni t

44

Significance Level for Dunn’s Method

You are controlling type 1 error by using a smaller significance level equal to α/(# of comparisons). The appropriate ‘# of comparisons’ is a matter of some controversy. The common options (in order of ascending conservatism) are:

1. The number of comparisons you will be making within the set

of nonorthogonal a priori comparisons.2. The total number of a priori comparisons you will be making

(orthogonal as well as nonorthogonal).3. The total number of statistical tests you will be performing on

the data. For example, using criteria #1, if you are making 10 a priori, non-

orthogonal comparisons then set your significance level for each comparison at .05/10=.005

45

How Error Rate is Controlled

1. You can make as many a priori comparisons as you would like, but the more you make the harder it is to reject H0 on each one.

2. One weird mean may appear in several comparisons in a redundant way, but the correction to the error rate per comparison should help control that to some degree.

3. As the comparisons are decided upon a priori you can’t wait until after you see the means and then select the biggest and smallest for your comparison.

46

A Posteriori Comparisons

The general rule for a posteriori comparisons is that first you need to reject H0 on the overall F test before you can do an a posteriori comparison. In other words, first you have to show their is an effect somewhere in the data before you can snoop around looking for where it is.

There are many, many a posteriori comparison procedures available, we will look at one that can do any comparison you want.

47

Scheffe’s Method for a posteriori Comparisons

1. You first have to reject H0 on the overall F test.

2. For each comparison, use (Fc from the overall F test)x(a-1) as your Fc value. In our example it would be Fc =(3.48)(5-1)=13.92

48

How Error Rate is Controlled1. You can make every possible comparison, but you

have set your critical value so high that the error rate per comparison set = your significance level.

2. One weird mean may appear in several comparisons in a redundant way, but the correction to the error rate per comparison controls that.

3. The overall F test must be significant before you do this procedure, the overall F test is not fooled by the likely big difference between the largest and smallest mean when you have many groups.

49

Advantages and Disadvantages

Advantage of Scheffe’s method: you can do any comparisons you want, and as many as you want.

Disadvantage: Scheffe’s assumes you are going to do every possible comparison, and so it makes the error rate per comparison very low. If you don’t make a lot of comparisons then you have conservatively biased your chances of rejecting H0.

50

Additional Comparison Procedures

• Tukey’s HSD Test: this is used to make all possible pair-wise comparisons among the group means. The error rate per comparison set is your significance level (i.e. .05), while the error rate per comparison is less than your significance level.

• Dunnett’s Test: this is used to compare each treatment group one at a time with the control group. Again, the error rate per comparison set is your significance level, while the error rate per comparison is less than your significance level.

51

On Post-Hoc ComparisonsSome researchers (including myself) believe that

data-snooping comparisons (e.g. doing all pair-wise comparisons) are of dubious value, that analyses should be limited to results that test a priori hypotheses. This is consistent with a certain view on the processes of science.

In this view post-hoc or data snooping comparisons should not be interpreted within the context of the theories being examine in a study but can serve to help shape where the theory goes next (which then serve as the a priori aspect of the next experiment).

52

On Interpreting Comparisons in Statistical Packages

Statistical programs make it easy to interpret the results of comparisons. The internal manipulations to control error rate are invisible, they simply report a p value. If the p value is less than your general significance level then the comparison is statistically significant using whatever procedure is involved. In other words, if p.05 then reject H0 for that comparison no matter which technique you are using.

53

Using SPSS

The one-way ANOVA menu item in SPSS gives you a choice between entering a specific contrast (which will be evaluated as if it were an orthogonal, a priori, comparison...i.e. no control over error rate) or selecting a post hoc procedure. If you select a post hoc procedure then you will be limited to specific types of comparisons, most of the procedures automatically do all pair-wise comparisons.

54

Effect Size of Comparisons

As each comparison reduces the data to two groups we report the difference between the means of those two groups as a measure of effect size. For example, if the comparison involves Group 1 vs (Groups 2 & 3 Combined) then we have:

Mean difference = mean of group 1 – (mean of groups 2 and 3 combined).

55

Standardized Effect Size

The difference between the two means of the comparison can be turned into a standardized effect size using Cohen’s, Hedges’s, or Glass’s formulas. Cohen’s and Hedges’s assume that all of the variances are equal. Conveniently, a pooled estimate of the standard deviation can be obtained by taking the square root of MSwithin.

Within2 MS

differencemean

est.

differencemean g sHedges'

Y

56

Standardized Effect Size

If the variances population variances are not equal (an option within the SPSS ANOVA procedure will allow you to test the assumption of homogeneity) then the denominator should be the estimate of the population standard deviation from the control group. Indicate in SPSS that you want the ‘descriptives’ option when doing the ANOVA, then look at the ‘Std Deviation’ of the group you consider to be the control group.

Controlσ est.

differencemean g sGlass'

Date post:	14-Jan-2016
Category:	Documents
Upload:	milton-carter
View:	215 times
Download:	0 times

1 Psych 5500/6500 Comparisons Among Treatment Means in Single- Factor Experiments Fall, 2008.

Documents