+ All Categories
Home > Documents > 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

Date post: 02-Jan-2016
Category:
Upload: cassandra-mckinney
View: 214 times
Download: 0 times
Share this document with a friend
36
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008
Transcript
Page 1: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

1

Psych 5500/6500

The t Test for a Single Group Mean (Part 4): Power

Fall, 2008

Page 2: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

2

PowerPower is the probability that you will be able to reject H0

when H0 is actually false. Another way of saying that is that it is the probability you will be able to conclude there was an effect in your study (that something caused the population mean to differ from what was predicted by H0) when there really was such an effect.

Beta is the probability that you won’t be able to reject H0 when H0 is actually false.

Power + beta = 1.00 (i.e. if H0 is false you will either make the correct decision and reject it or you won’t)

Page 3: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

3

Power (metaphorically)

Power can be thought of metaphorically as how good of a search you do for an effect. If you don’t know how carefully someone searches for something it is hard to interpret what it means if they failed to find it. That is why failure to reject H0 is worded as ambiguously as it is, if someone failed to find an effect (i.e. failed to reject H0) is that because there was no effect (i.e. H0 is true) or is that because they didn’t look very hard (i.e. they had a low power experiment).

Page 4: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

4

Metaphor Continued

Say you have a kitchen drawer full of miscellaneous stuff and you wonder if the bottle opener is in the drawer. A low power search would be to open the drawer and take a quick glance. If you find the bottle opener that’s great, but if you don’t, does that mean it isn’t in the drawer? Say, however, that you spend five minutes rummaging around in the drawer and can’t find the bottle opener, that is a more powerful search and failure to find the bottle opener might be more reasonably interpreted as evidence that it wasn’t in the drawer.

Page 5: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

5

Influences on Power

We will take a close four factors that affect the power of an experiment (and briefly mention two others).

1. Effect size.

2. Variability (i.e. variance/standard deviation)

3. Sample size.

4. One-tail test versus a two-tail test.

Page 6: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

6

Increasing Power

There are things you can do to increase the power of an experiment. The neat thing about these ways of increasing power is that they increase the chances of rejecting H0 only when H0 is false (i.e. the help us to find an effect only when an effect exists). If H0 is true, then these manipulations have no effect on the outcome of the experiment, thus you are not biasing the results of the analysis.

Page 7: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

7

Example

We have a math test whose mean score in the past has been 50. We have a new way of teaching math and we think it might lead to different test scores than the old way, to test this theory we sample 10 students, teach them the new way, and then give them the math test:

H0: μ=50 (i.e. mean is the same as it used to be)

Ha: μ50 (i.e. the mean is no longer the same)

Page 8: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

8

The Sampling Distribution Assuming H0 is True

262.2

91

50.110

74.4 est. so

10

74.4 est. sample, In the

50μμ

Y

Y

YY

ct

Ndf

N

Page 9: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

9

The Sampling Distribution Assuming H0 is True

Upon this we will base our decision regarding whether or notto reject H0.

Page 10: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

10

A Specific Alternative Hypothesis

To look at the power of an experiment we need a specific value for what μ would equal if Ha is true. Let’s examine the power of this test if in reality the effect of the new teaching method is to increase scores on the math test by an average of five.

H0: μ = 50Ha: μ = 55

Page 11: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

11

The Sampling Distribution if Ha is True

If Ha is correct and μ = 55 then the curve below represents the possible outcomes of the experiment. Note that the standard error, which is based upon the data and not upon a theory, hasn’t changed.

Page 12: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

12

PowerThe decision is (always) based upon assuming H0 is true,

but if Ha is true then its curve represents reality, in that case the proportion of the Ha curve that is shaded is the probability that H0 will be rejected.

Note: a tiny bit of the Ha curve is in the other rejection region as well.

Page 13: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

13

Power and Effect Size

The power of an experiment is affected by the size of the effect being tested, the greater the effect size the more power in the experiment. In our example the new teaching procedure raised math scores on the average by ‘5’, the following graphs show the increase in power that would result if the new procedure had a greater effect and raised math scores on the average by ‘6’.

Page 14: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

For More Power: Increase Effect Size

The effect on power of moving from Ha: μ=55 to Ha: μ=56

Page 15: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

15

It is easier to reject H0 when the effect size is greater because it is more likely that the sample mean will be further from what H0 proposes and thus more likely to fall within the ‘reject H0’ region. In terms of our ‘search’ metaphor, it is easier to ‘find’ a big effect than a small one.

Page 16: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

16

Power and Variance

The standard error of the mean is affected by the standard deviation of the population from which we are sampling, decreasing the variability of the population will decrease the standard error of the mean, which will give the test more power. In the following curve the standard error has been reduced from 1.5 to 0.95, this would occur if σY = 3 rather than 4.74

N

σσ est. Y

Y

Page 17: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

17

For More Power Decrease Variance

Effect on power of moving from σY=4.74 to σY=3.00

50.110

74.4Y

NY

95.010

00.3Y

NY

Page 18: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

18

When the variance/standard deviation of the population decreases the stability of the sample mean increases, meaning that it is less likely that you will get a weird sample mean just due to chance, which in turn makes it easier to draw inferences about the population mean based upon the sample mean.

Page 19: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

19

How to Decrease Variance

1. Sample from a more homogeneous population, but when you do that limits to whom you can generalize the results.

2. Use a different experimental design that has a decrease of variance built into it. We will see our first example of that when we get to the ‘t test for dependent groups’ (and it will also be a major part of what we cover next semester).

Page 20: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

20

Power and N

Increasing N will also decrease the standard error of the mean. In addition, increasing N also will decrease the tc values and thus increase power.

In the following curve I increased N from 10 to 22. I chose this value of N so that I could reuse the previous slide (increasing N from 10 to 22 has the same effect as decreasing the standard deviation from 4.7 to 3).

N

σσ est. Y

Y

Page 21: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

21

For More Power Increase N

Note: if N=22 then tc would be 2.074

50.110

74.4Y

NY

95.022

74.4Y

NY

Page 22: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

22

As with decreasing variance, increasing N will make the sample mean more stable and thus make it easier to make inferences about the true value of the population mean.

Important caveat here: it is very possible to increase N to the point where even the most trivial effect will be statistically significant.

Page 23: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

23

Power and One-Tail Tests

If you perform a one-tail test and if your prediction is correct concerning in which direction the effect will fall then you will have a more powerful test. Of course you must have a priori justification for making the test one-tailed.

Page 24: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

24

For More Power Use a 1-Tail Test

Page 25: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

25

Other Influences on Power5. Significance level (and thus alpha). Although I

didn’t create another graphic for it, it is obvious if you look back at the graphic for the 1-tail vs. 2-tail test that if I had used a smaller significance level (e.g. of .01 rather than of .05) then less of the Ha curve would have fallen in the reject H0 region, and thus power would have been less. Making it harder to reject H0 hurts power.

6. Violations of the assumptions underlying a test can also influence power, we will cover that in the lecture on assumptions.

Page 26: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

26

Estimating Power

With the advent of better software (we will be using GPower 3) it has become easier to estimate the power of an experiment and this has taken on an increasingly important role in experimental psychology. We will look at two of the contexts in which power estimates have become important:

1. A priori estimates of power2. Post hoc estimates of power.

Page 27: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

27

A Priori Estimates of Power

A priori estimates of power are used to help design a study that has a specific, desired, level of power. This is useful for two reasons:

1. There is little reason to run an experiment that has low power (what is the use of looking for an effect if there is little chance you will find it even if it is there?).

2. You will probably have to include an a priori estimate of power to obtain a grant to fund your research (what is the point of giving money to a project that has little chance of finding an effect?).

Page 28: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

28

A Priori: GPower 3With the GPower 3 ‘a priori’ function you enter:

1. The level of power you want .2. The anticipated value for Cohen’s d.3. Alpha.4. Whether it is a 1-tail or a 2-tail test.

GPower then computes what N would give you that level of power. It will also provide a graph showing the relationship between N and power given the values entered above, which can be useful in seeing how much power would be gained or lost if you were to increase or decrease N.

Page 29: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

29

Total sample size

Pow

er

(1-β

err

pro

b)

t tests - Means: Difference from constant (one sample case)Tail(s) = Two, α err prob = 0.05, Effect size d = 0.7

0.5

0.6

0.7

0.8

0.9

1

10 20 30 40 50 60 70 80 90 100

From GPower. Large effect size of d= 0.7: note an N of around 18 would give you power=.80 (an often recommended level) and that you gain little from making N greater than 40.

Page 30: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

30

Total sample size

Pow

er

(1-β

err

pro

b)

t tests - Means: Difference from constant (one sample case)Tail(s) = Two, α err prob = 0.05, Effect size d = 0.2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

50 100 150 200 250 300

Much smaller effect size (d=.2), note N must be around 200 toobtain a power of 0.80

Page 31: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

31

Anticipated Value for d

How do you arrive at the anticipated value for Cohen’s d in your upcoming study?

1. Look at prior, similar studies (your own pilot study, similar experiments run by others, etc.). If those studies report values for d then use them to anticipate the value of d in your study. If those studies don’t report d but do report the raw effect size and the standard deviation of the variable then you can plug those into GPower 3 and it will compute d for you.

Page 32: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

32

Anticipated Value for d

How do you arrive at the anticipated value for Cohen’s d in your upcoming study?

2. If prior, similar studies are not available then make your best guest, based upon your knowledge of the literature, about whether your study will find a ‘small’, ‘medium’, or ‘large’ effect, then use Cohen’s guidelines for what value of d you would expect to obtain: d=.2 would be a small effect size, d=.5 would be a medium effect size, and d=.8 would be a large effect size. This, in my view, is the most useful thing you can do with Cohen’s guidelines.

Page 33: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

33

Post Hoc: GPower 3

The GPower 3 post hoc analysis is designed to help you compute the power of an experiment after it has been completed. You enter values for: the effect size (d) that was found in your experiment (or you can enter the raw effect size and standard deviation and GPower 3 will compute d for you); the N of your experiment; the alpha you used; and whether it was a 1-tail or a 2-tail test, and GPower will compute the power of your experiment.

Page 34: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

34

Using Post Hoc Power Analysis

Knowing post hoc the power of your experiment can be useful in interpreting a decision to not reject H0. If you had a powerful search and failed to find an effect (i.e. failed to reject H0) that probably means that the effect wasn’t there (i.e. it probably means H0 is true).

If you don’t reject H0 you will still want to decide for yourself whether it’s probably the case that H0 is true or if it’s probably the case that H0 is false but you simply failed to reject H0 in this experiment. This will influence where you go next (replicate the same experiment but with more power or change directions because you think H0 is actually true).

Page 35: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

35

When I first introduced null hypothesis testing I said that you can either:– ‘reject H0’, or ‘not reject H0’,

But that you can’t– ‘accept H0’ (conclude that H0 is true)

A type 1 error is associated with rejecting H0, and the probability of rejecting H0 when H0 is actually true is equal to alpha=.05. We, as scientists, have decided we can live with that.

A type 2 error is associated with not rejecting H0, and the probability of not rejecting H0 when H0 is actually false is equal to beta. If we don’t know what beta is, or if we do and it is greater than .05, then there is too much uncertainty to make a strong statement such as ‘I conclude H0 is true’.

Page 36: 1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

36

While I’ve never heard this said, it would seem to me that if power.95, which would make beta.05, then it would be reasonable to say the two possible results of the experiment would be:

1. reject H0

2. accept H0

In other words, if power .95 and we don’t reject H0 then it is possible to state (within acceptable probability of error) that we have shown that H0 is true.


Recommended