+ All Categories
Home > Documents > Two-Sample Hypothesis Testing

Two-Sample Hypothesis Testing

Date post: 06-Jan-2016
Category:
Upload: pearly
View: 21 times
Download: 1 times
Share this document with a friend
Description:
Two-Sample Hypothesis Testing. Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population means is zero. You have independent samples from the two populations. Their sizes are n 1 and n 2. So we have a standard normal distribution. - PowerPoint PPT Presentation
40
Two-Sample Hypothesis Testing
Transcript
Page 1: Two-Sample Hypothesis Testing

Two-Sample Hypothesis

Testing

Page 2: Two-Sample Hypothesis Testing

Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population means is zero.

. and be known to are deviations standard population The 22

21

You have independent samples from the two populations. Their sizes are n1 and n2.

. is X of variance theand

, ,populationfirst theofmean theequals X ofmean The

.X is populationfirst thefrom sample for themean sample The

1

2

1

11

1

1

n

. is X of variance theand

, ,population 2 theofmean theequals X ofmean The

.X is population 2 thefrom sample for themean sample theSimilarly,

2

2

2

2nd

2

2nd

2

n

Page 3: Two-Sample Hypothesis Testing

.n

n

)X - XV(2

22

1

21

21

.- )X - XE( 2121

t,independen are X and X Since 21

.n

n

is X - X ofdeviation standard theand2

22

1

21

21

.ddistributenormally ely approximat is X - X Also 21

2

22

1

21

2121

n

n

)()XX( Z

We’ll use this formula to test

whether the population means are equal.

So we have a standard normal distribution

Page 4: Two-Sample Hypothesis Testing

ExampleSuppose from a large class, we sample 4 grades: 64, 66, 89, 77.From another large class, we sample 3 grades: 56, 71, 53.We assume that the class grades are normally distributed, and that the population variances for the two classes are both 96.Test at the 5% level . :H versus:H 211210

.74X have wesample,first in the grades theAveraging 1 .60X have wesample, second in the grades theAveraging 2

2

22

1

21

2121

n

n

)()XX( ZThen,

396

4

96

)0()06(74

23 24

14

56

14

4833.7

14 .871

Page 5: Two-Sample Hypothesis Testing

.025 .025

-1.96 1.960

.475.475crit. reg.crit. reg.

acceptance region

As we’ve found before, the Z-values for a two tailed 5% test are 1.96 and -1.96, as indicated below.

Since our Z-statistic, 1.87, is in the acceptance region, we accept H0: 1- 2= 0, concluding that the population means are equal.

Z

Page 6: Two-Sample Hypothesis Testing

What do you do if you don’t know the population variances in this formula?

n

n

)()XX( Z

2

22

1

21

2121

Replace the population variances with the sample variances and the Z distribution with the t distribution.

n

n

)()XX( t

2

22

1

21

2121

ss

22 2

1 2

1 2

2 22 2

1 2

1 2

1 2

1 1

s sn n

s sn n

n n

The number of degrees of freedom is the integer part of this very messy formula:

Page 7: Two-Sample Hypothesis Testing

ExampleConsider the same example as the last one but without the information on the population variances. Again test at the 5% level . :H versus:H 211210

Class 1 Class 2X1 X2

64 56

66 71

89 53

77

296 180

74 4

296X1 60

3

180X2

We need to determine the sample means and sample variances.As before, the sample means are 74 and 60.

Page 8: Two-Sample Hypothesis Testing

Class 1 Class 2X1 X2

64 -10 56 -4

66 -8 71 11

89 15 53 -7

77 3

296 180

11 XX 22 XX

1-n

)X(X s is variancesample the:Remember

n

1i

2i

2

So we subtract the sample mean from each of the grades.

74 4

296X1 60

3

180X2

Page 9: Two-Sample Hypothesis Testing

Class 1 Class 2X1 X2

64 -10 100 56 -4 16

66 -8 64 71 11 121

89 15 225 53 -7 49

77 3 9

296 398 180 186

11 XX 211 )X(X 22 XX 2

22 )X(X

1-n

)X(X s variancesample

n

1i

2i

2

Then we square those differences and add them up.

74 4

296X1 60

3

180X2

Page 10: Two-Sample Hypothesis Testing

Class 1 Class 2X1 X2

64 -10 100 56 -4 16

66 -8 64 71 11 121

89 15 225 53 -7 49

77 3 9

296 398 180 186

11 XX 211 )X(X 22 XX 2

22 )X(X

132.673

398s

74 4

296X

21

1

0.932

186s

60 3

180X

22

2

Then we divide that sum by n-1 to get the sample variance.

1-n

)X(X s variancesample

n

1i

2i

2

Page 11: Two-Sample Hypothesis Testing

What are the dof & critical t value?

Since we have: 2 2

1 2 1 2132.67, 93.0, 4, and 3,s s n n

our very messy dof formula yields

So the degrees of freedom is the integer part of 4.86 or 4.

For a 5% two-tailed test & 4 dof, the t value is 2.7764 . -2.7764 0 2.7764 t4

0.950.0250.025

222 2

1 2

1 2

2 2 2 22 2

1 2

1 2

1 2

132.67 93.04 3

= 4.860132.67 93.0

4 33 21 1

s s

n n

s s

n n

n n

Page 12: Two-Sample Hypothesis Testing

1 2 1 24 2 2

1 2

1 2

(X X ) ( )Then, t

n ns s

3

93

467.132

)0()06(74

.7481

132.67s 74; X 4; n 2111

0.93s 60; X 3; n 2222

Since our t-value, 1.748, is in the acceptance region, we accept H0: 1 = 2 -2.7764 0 2.7764 t4

0.950.0250.025

Next we need to compute our test statistic.

Page 13: Two-Sample Hypothesis Testing

Sometimes we don’t know the population variances, but we believe that they are equal.

So we need to compute an estimate of the common variance, which we do by pooling our information from the two samples.

We denote the pooled sample variance by sp2.

sp2 is a weighted average of the two sample variances, with more

weight put on the sample variance that was based on the larger sample.

If the two samples are the same size, sp2 is just the sum of the

two sample variances, divided by two.

In general,

2nn

s)1(ns)1(n s

21

222

2112

p

Page 14: Two-Sample Hypothesis Testing

2

22

1

21

2121

n

n

)()XX( Z

Let’s return for a moment to the statistic that we used to compare population means when the population variances were known.

.by both themdenote slet' equal,

be tobelieved are and

variancespopulation theSince

2

22

21

2

2

1

2

2121

n

n

)()XX( Z

21

2

2121

n1

n1

)()XX( Z

Then we can factor out the 2

and replace the 2 by sp2

and the Z by t.The number of degrees of freedom is n1 + n2 -2.

21

2p

2121

n1

n1

s

)()XX( t

Page 15: Two-Sample Hypothesis Testing

Let’s do the previous example again, but this time assume that the unknown population variances are believed to be equal. We had:

132.67;s 74; X 4; n 2111 0.93s 60; X 3; n 2

222

2nn

s)1(ns)1(n s

21

222

2112

p

234

)0.93)(2()67.132)(3(

16.81

21

2p

2121

n1

n1

s

)()XX( t

31

41

8.161

)0()60(74 .701

The number of degrees of freedom is n1 + n2 -2, and we are doing a 2-tailed test at the 5% level.

Since our t-statistic 1.70 is in the acceptance region, we accept H0: 1 = 2. 0

.025

crit. reg.Acceptance

region

t5

.025

crit. reg.

-2.571 2.571

Page 16: Two-Sample Hypothesis Testing

In the previous three hypothesis tests, we tested whether 2 populations has the same mean, when we had 2 independent samples.

We can’t use those tests, however, if the 2 samples are not independent.

For example, suppose you are looking at the weights of people, before and after a fitness program.

Since the weights are for the same group of people, the before and after weights are not independent of each other.

In this type of situation, we can use a hypothesis test based on matched-pairs samples.

Page 17: Two-Sample Hypothesis Testing

The test statistic is

ns

-D

D1

nt

The hypotheses are

.differencemean population theis where

,0 :H and 0:H 10

ns.observatio of pairs ofnumber theisn and

s,difference theofdeviation standard sample theis s

,differencemean sample theis D where

D

Page 18: Two-Sample Hypothesis Testing

Example

. 0 :H versus0 :H :i.e. change,weight

a causes program he whether tlevel 5% at theTest observed. is

weightsof sample following theprogram, fitness aafter and Before

10

person Before After D = A-B

1 168 160

2 195 197

3 155 150

4 183 180

5 169 163

DD 2)D(D

Page 19: Two-Sample Hypothesis Testing

person Before After D = A-B

1 168 160 -8

2 195 197 2

3 155 150 -5

4 183 180 -3

5 169 163 -6

DD 2)D(D

First we calculate the weight differences.

Page 20: Two-Sample Hypothesis Testing

person Before After D = A-B

1 168 160 -8

2 195 197 2

3 155 150 -5

4 183 180 -3

5 169 163 -6

-20

DD 2)D(D

45

20-D

Then we add up the differences and determine the mean.

Page 21: Two-Sample Hypothesis Testing

person Before After D = A-B

1 168 160 -8

2 195 197 2

3 155 150 -5

4 183 180 -3

5 169 163 -6

-20

DD 2)D(D

45

20-D

1

)(s

2

D

n

DD

Next we need to calculate the sample standard deviation for the weight differences.

The sample standard deviation is

Page 22: Two-Sample Hypothesis Testing

person Before After D = A-B

1 168 160 -8 -4

2 195 197 2 6

3 155 150 -5 -1

4 183 180 -3 1

5 169 163 -6 -2

-20

DD 2)D(D

45

20-D

We subtract the mean difference from each of the D values.

Page 23: Two-Sample Hypothesis Testing

person Before After D = A-B

1 168 160 -8 -4 16

2 195 197 2 6 36

3 155 150 -5 -1 1

4 183 180 -3 1 1

5 169 163 -6 -2 4

-20 58

DD 2)D(D

45

20-D

We square the values in that column, and add up the squares.

Page 24: Two-Sample Hypothesis Testing

person Before After D = A-B

1 168 160 -8 -4 16

2 195 197 2 6 36

3 155 150 -5 -1 1

4 183 180 -3 1 1

5 169 163 -6 -2 4

-20 58

DD 2)D(D

45

20-D

, 1

)(s

2

D

n

DD

81.35.144

58sD

Then since

we divide by n-1 = 4, and take the square root.

Page 25: Two-Sample Hypothesis Testing

person Before After D = A-B

1 168 160 -8 -4 16

2 195 197 2 6 36

3 155 150 -5 -1 1

4 183 180 -3 1 1

5 169 163 -6 -2 4

-20 58

DD 2)D(D

45

20-D 81.35.14

4

58sD

Next we assemble our statistic.

ns

-D

D1

nt

53.81

0-4- 2.35-

Page 26: Two-Sample Hypothesis Testing

0

.025

crit. reg.Acceptance region

t4

.025

crit. reg.

-2.776 2.776

Since we had 5 people and 5 pairs of weights, n=5, and the number of degrees of freedom is n-1 = 4.

We’re doing a 2-tailed t-test at the 5% level, so the critical region looks like this:

Since our t-statistic, -2.35, is in the acceptance region, we accept the null hypothesis that the program would cause no average weight change for the population as a whole.

Page 27: Two-Sample Hypothesis Testing

Hypothesis tests on the difference between 2 population proportions, using independent samples

estimatept theof dev, std theof estimateor dev, std

estimate)point theof(mean - estimate)(point

If you look at the statistics we have used in our hypothesis tests, you will notice that they have a common form:

In our hypothesis tests on the difference between 2 population proportions, we are going to use that same form.

0 1 2 1 1 2H : - 0 versus H :

1 2The point estimate is p - p , which is the difference in the sample proportions.

1 2The mean of the point estimate is - , which is the population proportion.

Page 28: Two-Sample Hypothesis Testing

We still need to determine the standard deviation, or an estimate of the standard deviation, of our point estimate.

1 2We start with V(p -p ).

1 2 1 2

Under our assumption that the samples are independent,

V(p -p ) V(p ) V(p ) 1 1 2 2

1 2

(1- ) (1- ).

n n

1 2According to the null hypothesis, ,

so we'll call them both .

1 21 2

(1 ) (1- )So, V(p -p )

n n

1 2

1 1 (1- )

n n

Page 29: Two-Sample Hypothesis Testing

1 21 2

1 1We have V(p -p ) (1- ) ,

n n

but we don't know what is.

We need to estimate the hypothetically common value of .st

1 1

nd2 2

Let X be the number of "successes" in the 1 sample, which is of size n ,

and X be the number of "successes" in the 2 sample, which is of size n .

Our estimate of the common value for will be the

1 2

1 2

proportion of successes

X Xin the combined sample or p .

n n

1 21 2

1 2

1 1So our estimate of V(p - p ) is p(1-p) , and our estimate

n n

of the standard deviation of p - p is the square root of that

expression.

Page 30: Two-Sample Hypothesis Testing

Assembling the pieces, we have

1 2 1 2

1 2

(p - p ) - ( - )Z

1 1p(1-p)

n n

1 2

1 2

X Xwhere p .

n n

Page 31: Two-Sample Hypothesis Testing

Suppose the proportions of Democrats in samples of 100 and 225 from 2 states are 33% and 20%. Test at the 5% level the hypothesis that the proportion of Democrats in the populations of the 2 states are equal. 0 1 2 1 1 2H : - 0 versus H : - 0.

1 2 1 2

1 2

(p - p ) - ( - )Z

1 1p(1-p)

n n

2251

1001

(.24)(.76)

(0) - .20) - (.33

The number of Democrats in the first sample is (.33)(100) 33,

and the number in the second sample is (.20)(225) 45.

So the proportion in the combined sample is

33 45 78p 0.24 , and

100 225 325

1-p .76.

2.53

Page 32: Two-Sample Hypothesis Testing

0

.025

crit. reg.Acceptance region

Z

.025

crit. reg.

-1.96 1.96

We’re doing a 2-tailed Z-test at the 5% level, so the critical region looks like this:

Since our Z-statistic, 2.53, is in the critical region, we reject the null hypothesis and accept the alternative that the proportions of Democrats in the 2 states are different.

Page 33: Two-Sample Hypothesis Testing

Sometimes you want to test whether two independent samples have the same variance.

If the populations are normally distributed, we can use the F-statistic to perform the test.

Page 34: Two-Sample Hypothesis Testing

1 2

21

1, 1 22

n n

sF

s

This F-statistic has n1-1 degrees of freedom for the numerator, and n2-1 degrees of freedom for the denominator.

1

2

22 1 1

111

22 2 2

221

21

( )where is the sample variance for the first sample,

1

( ) is the sample variance for the second sample,

1

and is the larger of the two sam

ni

i

ni

i

X Xs

n

X Xs

n

s

ple variances.

The F-statistic is

2 21 2Notice that, because s is larger than s , this F-statistic will always be greater than 1.

So our critical region will always just be the upper tail.

Page 35: Two-Sample Hypothesis Testing

f(F)

critical region

acceptance region

1 21, 1n nF

with the tail for the critical region looks like this:

1 2

21

1, 1 22

,n n

sF

s

The distribution of our F-statistic,

Page 36: Two-Sample Hypothesis Testing

Two-sided versus one-sided tests for equality of variance

While you are always using the upper tail of the F-test on tests of equality of variance, the size of the critical region you sketch varies with whether you have a two-sided or a one-sided test.

Let’s see why this is true.

Page 37: Two-Sample Hypothesis Testing

2 2 2 20 1 2 1 1 2For a two-sided test, we have: : versus H : .H

Our sketch of the critical region is based on the situation in which the variance is greater for the first group, but we admit that, if we had information for the entire population, we might find that the situation is reversed.So there is an implicit second sketch of an F-statistic in which the sample variance of the second group is in the numerator.Thus, for each of the sketches, the sketch we draw and the implicit sketch, the area of the critical region is α/2, half of the test level α. So, for example, if you are doing a two-sided test at the 5% level, your sketch will show a tail area of 0.025.

2 21 2 ,s s

While, for our samples, the sample variance from the first group was greater,

our alternative hypothesis indicates that we think that the population variance could have been larger or smaller for the first population:

2 2 2 21 2 1 2 or .

Page 38: Two-Sample Hypothesis Testing

What if we are performing a one-sided test?

Now we are looking at a situation in which the sample variance is again larger for the first group. This time however, we want to know if, in fact, the population variance is really larger for the first group. So we have the one-sided alternative shown above.

Keep in mind that, as usual with one-sided tests, the null hypothesis is the devil’s advocate view. Here the devil’s advocate is saying: nah, the population variance for the first group isn’t really any larger than for the second group.

For a one-sided test with level α, your critical region will have area α.

For example, if you are performing a one-sided test at the 5% level, the critical region will have area 0.05.

2 2 2 20 1 2 1 1 2: versus H : H

Page 39: Two-Sample Hypothesis Testing

Example: You are looking at test results for two groups of students. There are 25 students in the first group, for which you have calculated the sample variance to be 15. There are 30 students in the second group, for which you have calculated the sample variance to be 10. Test at the 10% level whether the populations variances are the same.

21

24, 29 22

151.5

10

sF

s

F24, 29

f(F)

0.05

1.90

critical region

acceptance region

Because 1.5 is in the acceptance region, you cannot reject the null hypothesis and you conclude that the variances of the two populations are the same.

There are 25-1 = 24 degrees of freedom in the numerator and 30-1=29 degrees of freedom in the denominator.This is a two-sided test, so the critical region has area 0.05.

Page 40: Two-Sample Hypothesis Testing

In the two sections we have just completed, we did 9 different types of hypothesis tests.

1. population mean - 1 sample - known population variance 2. population mean - 1 sample - unknown population variance 3. population proportion - 1 sample 4. difference in population means - 2 independent samples - known

population variances 5. difference in population means - 2 independent samples -

unknown population variances 6. difference in population means - 2 independent samples -

unknown population variances that are believed to be equal 7. difference in population means - 2 dependent samples 8. difference in population proportions - 2 independent samples 9. Difference in population variances - 2 independent samples

The statistics for these tests are compiled on a summary sheet which is available at my web site.


Recommended