+ All Categories
Home > Documents > Inference about Two Populations

Inference about Two Populations

Date post: 04-Jan-2016
Category:
Upload: gareth-patton
View: 32 times
Download: 2 times
Share this document with a friend
Description:
Inference about Two Populations. Chapter 13. 12.1 Introduction. Variety of techniques are presented whose objective is to compare two populations. We are interested in: The difference between two means. The ratio of two variances. The difference between two proportions. - PowerPoint PPT Presentation
81
1 Inference about Two Populations Chapter 13
Transcript
Page 1: Inference about  Two Populations

1

Inference about Two Populations

Chapter 13

Page 2: Inference about  Two Populations

2

12.1 Introduction12.1 Introduction

• Variety of techniques are presented whose objective is to compare two populations.

• We are interested in:– The difference between two means.– The ratio of two variances.– The difference between two proportions.

Page 3: Inference about  Two Populations

3

• Two random samples are drawn from the two populations of interest.

• Because we compare two population means, we use the statistic .

13.2 Inference about the Difference between Two Means: Independent Samples

13.2 Inference about the Difference between Two Means: Independent Samples

21 xx

Page 4: Inference about  Two Populations

4

1. is normally distributed if the (original) population distributions are normal .

2. is approximately normally distributed if the (original) population is not normal, but the samples’ size is sufficiently large (greater than 30).

3. The expected value of is 1 - 2

4. The variance of is 12/n1 + 2

2/n2

21 xx

21 xx

The Sampling Distribution ofThe Sampling Distribution of21

xx

21xx

21xx

Page 5: Inference about  Two Populations

5

• If the sampling distribution of is normal or approximately normal we can write:

• Z can be used to build a test statistic or a confidence interval for 1 - 2

21

21

nn

)()xx(Z

21

21

nn

)()xx(Z

21xx

Making an inference about –Making an inference about –

Page 6: Inference about  Two Populations

6

21

21

nn

)()xx(Z

21

21

nn

)()xx(Z

• Practically, the “Z” statistic is hardly used, because the population variances are not known.

? ?

• Instead, we construct a t statistic using the sample “variances” (S1

2 and S22).

S22S1

2t

Making an inference about –Making an inference about –

Page 7: Inference about  Two Populations

7

• Two cases are considered when producing the t-statistic.– The two unknown population variances are equal.– The two unknown population variances are not equal.

Making an inference about –Making an inference about –

Page 8: Inference about  Two Populations

8

Inference about Inference about ––: Equal variances: Equal variances

2nns)1n(s)1n(

S21

2

22

2

112

p

2nn

s)1n(s)1n(S

21

2

22

2

112

p

Example: s12 = 25; s2

2 = 30; n1 = 10; n2 = 15. Then,

04347.2821510

)30)(115()25)(110(S2

p

• Calculate the pooled variance estimate by:

n2 = 15n1 = 10

21S

22S

The pooledvariance estimator

Page 9: Inference about  Two Populations

9

Inference about Inference about ––: Equal variances: Equal variances

2nns)1n(s)1n(

S21

2

22

2

112

p

2nn

s)1n(s)1n(S

21

2

22

2

112

p

Example: s12 = 25; s2

2 = 30; n1 = 10; n2 = 15. Then,

04347.2821510

)30)(115()25)(110(S2

p

• Calculate the pooled variance estimate by:

2pS

n2 = 15n1 = 10

21S

22S

The pooledVariance estimator

Page 10: Inference about  Two Populations

10

Inference about Inference about ––: Equal variances: Equal variances

• Construct the t-statistic as follows:

2nn.f.d

)n1

n1

(s

)()xx(t

21

21

2p

21

2nn.f.d

)n1

n1

(s

)()xx(t

21

21

2p

21

• Perform a hypothesis test H0: = 0 H1: > 0

or < 0 or 0

Build a confidence interval

level. confidence the is where

)n1

n1

(st)xx(21

2

p21

Page 11: Inference about  Two Populations

11

1

)(

1

)(

)/(d.f.

)(

)()(

2

22

22

1

21

21

22

221

21

2

22

1

21

21

n

ns

n

ns

nsns

n

s

n

s

xxt

1

)(

1

)(

)/(d.f.

)(

)()(

2

22

22

1

21

21

22

221

21

2

22

1

21

21

n

ns

n

ns

nsns

n

s

n

s

xxt

Inference about –: Unequal variancesInference about –: Unequal variances

Page 12: Inference about  Two Populations

12

Inference about –: Unequal variancesInference about –: Unequal variances

Conduct a hypothesis test as needed, or, build a confidence interval

level confidence the is where

n

s

n

s2txx

intervalConfidence

)2

22

1

21()21(

Page 13: Inference about  Two Populations

13

Which case to use:Equal variance or unequal variance?

Which case to use:Equal variance or unequal variance?

• Whenever there is insufficient evidence that the variances are unequal, it is preferable to perform the equal variances t-test.

• This is so, because for any two given samples

The number of degrees of freedom for the equal variances case

The number of degrees of freedom for the unequal variances case

Page 14: Inference about  Two Populations

14

Page 15: Inference about  Two Populations

15

• Example 13.1– Do people who eat high-fiber cereal for

breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast?

– A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal.

– For each person the number of calories consumed at lunch was recorded.

Example: Making an inference about –

Example: Making an inference about –

Page 16: Inference about  Two Populations

16

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Solution: • The data are interval. • The parameter to be tested is the difference between two means. • The claim to be tested is: The mean caloric intake of consumers (1) is less than that of non-consumers (2).

Example: Making an inference about –

Example: Making an inference about –

Page 17: Inference about  Two Populations

17

• The hypotheses are:

H0: (1 - 2) = 0H1: (1 - 2) < 0

– To check the whether the population variances are equal, we use (Xm13-01) computer output to find the sample variances

We have s12= 4103, and s2

2 = 10,670.

– It appears that the variances are unequal.

Example: Making an inference about –

Example: Making an inference about –

Page 18: Inference about  Two Populations

18

• Compute: Manually– From the data we have:

1236.122

1107

10710670

143

434103

)10710670434103(

670,10,103,423.633,02.604

22

2

22

2121

ssxx

Example: Making an inference about –

Example: Making an inference about –

Page 19: Inference about  Two Populations

19

• Compute: Manually– The rejection region is t < -t = -t.05,123 1.658

-2.09

107

10670

43

4103

)0()23.63302.604(

)()(

2

22

1

21

21

n

s

n

s

xxt

Example: Making an inference about –

Example: Making an inference about –

Page 20: Inference about  Two Populations

20

Example: Making an inference about –

Example: Making an inference about –

At the 5% significance level there is sufficient evidence to reject the null hypothesis.

-2.09 < -1.6573

Xm13-01

.0193 < .05

t-Test: Two-Sample Assuming Unequal Variances

Consumers NonconsumersMean 604.02 633.23Variance 4102.98 10669.77Observations 43 107Hypothesized Mean Difference 0df 123t Stat -2.09P(T<=t) one-tail 0.0193t Critical one-tail 1.6573P(T<=t) two-tail 0.0386t Critical two-tail 1.9794

Page 21: Inference about  Two Populations

21

56.1,86.5665.2721.29107

1067043

41039796.1)239.63302.604(

2n

22

s

1n

21

s

2t)

2x

1x(

• Compute: ManuallyThe confidence interval estimator for the differencebetween two means is

Example: Making an inference about –

Example: Making an inference about –

Page 22: Inference about  Two Populations

22

Page 23: Inference about  Two Populations

23

• Example 13.2– An ergonomic chair can be assembled using two

different sets of operations (Method A and Method B)

– The operations manager would like to know whether the assembly time under the two methods differ.

Example: Making an inference about –

Example: Making an inference about –

Page 24: Inference about  Two Populations

24

• Example 13.2– Two samples are randomly and independently selected

• A sample of 25 workers assembled the chair using method A.

• A sample of 25 workers assembled the chair using method B.

• The assembly times were recorded

– Do the assembly times of the two methods differs?

Example: Making an inference about –

Example: Making an inference about –

Page 25: Inference about  Two Populations

25

Example: Making an inference about –

Example: Making an inference about –

Method A Method B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Method A Method B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Assembly times in Minutes

Solution

• The data are interval.

• The parameter of interest is the difference between two population means.

• The claim to be tested is whether a difference between the two methods exists.

Page 26: Inference about  Two Populations

26

Example: Making an inference about –

Example: Making an inference about –• Compute: Manually

–The hypotheses test is:

H0: (1 - 2) 0 H1: (1 - 2) 0

– To check whether the two unknown population variances areequal we calculate S1

2 and S22 (Xm13-02).

– We have s12= 0.8478, and s2

2 =1.3031.

– The two population variances appear to be equal.

Page 27: Inference about  Two Populations

27

Example: Making an inference about –

Example: Making an inference about –• Compute: Manually

4822525.f.d

93.0

251

251

076.1

0)016.6288.6(t

4822525.f.d

93.0

251

251

076.1

0)016.6288.6(t

3031.1s 8478.0s 016.6x 288.6x 22

2121

076.122525

)303.1)(125()848.0)(125(S2

p

– To calculate the t-statistic we have:

Page 28: Inference about  Two Populations

28

• The rejection region is t < -t =-t.025,48 = -2.009 or t > t = t.025,48 = 2.009

• The test: Since t= -2.009 < 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis.

For = 0.05

2.009.093-2.009

Rejection regionRejection region

Example: Making an inference about –

Example: Making an inference about –

Page 29: Inference about  Two Populations

29

Example: Making an inference about –

Example: Making an inference about –

.3584 > .05

-2.0106 < .93 < +2.0106

Xm13-02t-Test: Two-Sample Assuming Equal Variances

Method A Method BMean 6.29 6.02Variance 0.8478 1.3031Observations 25 25Pooled Variance 1.08Hypothesized Mean Difference 0df 48t Stat 0.93P(T<=t) one-tail 0.1792t Critical one-tail 1.6772P(T<=t) two-tail 0.3584t Critical two-tail 2.0106

Page 30: Inference about  Two Populations

30

• Conclusion: There is no evidence to infer at the 5% significance level that the two assembly methods are different in terms of assembly time

Example: Making an inference about –

Example: Making an inference about –

Page 31: Inference about  Two Populations

31

Example: Making an inference about –

Example: Making an inference about –

A 95% confidence interval for 1 - 2 is calculated as follows:

]8616.0,3176.0[5896.0272.0

)25

1

25

11.075(0106.2016.6288.6

)11

()(21

221

nnstxx p

Thus, at 95% confidence level -0.3176 < 1 - 2 < 0.8616

Notice: “Zero” is included in the confidence interval

Page 32: Inference about  Two Populations

32

Checking the required Conditions for the equal variances case (Example 13.2)Checking the required Conditions for the equal variances case (Example 13.2)

The data appear to be approximately normal

0

2

4

6

8

10

12

5 5.8 6.6 7.4 8.2 More

Design A

01234567

4.2 5 5.8 6.6 7.4 More

Design B

Page 33: Inference about  Two Populations

33

13.4 Matched Pairs Experiment13.4 Matched Pairs Experiment

• What is a matched pair experiment?

• Why matched pairs experiments are needed? • How do we deal with data produced in this way?

The following example demonstrates a situationwhere a matched pair experiment is the correct approach to testing the difference between two population means.

Page 34: Inference about  Two Populations

34

Page 35: Inference about  Two Populations

35

Example 13.3 – To investigate the job offers obtained by MBA graduates, a

study focusing on salaries was conducted.– Particularly, the salaries offered to finance majors were

compared to those offered to marketing majors.– Two random samples of 25 graduates in each discipline were

selected, and the highest salary offer was recorded for each one. The data are stored in file Xm13-03.

– Can we infer that finance majors obtain higher salary offers

than do marketing majors among MBAs?.

13.4 Matched Pairs Experiment13.4 Matched Pairs Experiment

Page 36: Inference about  Two Populations

36

• Solution– Compare two populations of

interval data.

– The parameter tested is 1 - 2

Finance Marketing61,228 73,36151,836 36,95620,620 63,62773,356 71,06984,186 40,203

. .

. .

. .

1

2

The mean of the highest salaryoffered to Finance MBAs

The mean of the highest salaryoffered to Marketing MBAs

– H0: (1 - 2) = 0 H1: (1 - 2) > 0

13.4 Matched Pairs Experiment13.4 Matched Pairs Experiment

Page 37: Inference about  Two Populations

37

• Solution – continued

From the data we have:

559,228,262s

,294,433,360s

423,60x624,65x

22

21

2

1

• Let us assume equal variances

13.4 Matched Pairs Experiment13.4 Matched Pairs Experiment

Equal VariancesFinance Marketing

Mean 65624 60423Variance 360433294 262228559Observations 25 25Pooled Variance 311330926Hypothesized Mean Difference 0df 48t Stat 1.04P(T<=t) one-tail 0.1513t Critical one-tail 1.6772P(T<=t) two-tail 0.3026t Critical two-tail 2.0106

There is insufficient evidence to concludethat Finance MBAs are offered higher salaries than marketing MBAs.

Page 38: Inference about  Two Populations

38

• Question– The difference between the sample means is

65624 – 60423 = 5,201.– So, why could we not reject H0 and favor H1 where

(1 – 2 > 0)?

The effect of a large sample variabilityThe effect of a large sample variability

Page 39: Inference about  Two Populations

39

• Answer: – Sp

2 is large (because the sample variances are large) Sp

2 = 311,330,926. – A large variance reduces the value of the t statistic

and it becomes more difficult to reject H0.

The effect of a large sample variabilityThe effect of a large sample variability

)n1

n1

(s

)()xx(t

21

2p

21

Page 40: Inference about  Two Populations

40

Reducing the variabilityReducing the variability

The values each sample consists of might markedly vary...

The range of observationssample B

The range of observationssample A

Page 41: Inference about  Two Populations

41

...but the differences between pairs of observations might be quite close to one another, resulting in a small variability of the differences.

0

Differences

The range of thedifferences

Reducing the variabilityReducing the variability

Page 42: Inference about  Two Populations

42

The matched pairs experimentThe matched pairs experiment

• Since the difference of the means is equal to the mean of the differences we can rewrite the hypotheses in terms of D (the mean of the differences) rather than in terms of 1 – 2.

• This formulation has the benefit of a smaller variability. Group 1 Group 2 Difference

10 12 - 215 11 +4

Mean1 =12.5 Mean2 =11.5Mean1 – Mean2 = 1 Mean Differences = 1

Page 43: Inference about  Two Populations

43

• Example 13.4 – It was suspected that salary offers were affected by

students’ GPA, (which caused S12 and S2

2 to increase).– To reduce this variability, the following procedure was

used:• 25 ranges of GPAs were predetermined.• Students from each major were randomly selected, one from

each GPA range.• The highest salary offer for each student was recorded.

– From the data presented can we conclude that Finance majors are offered higher salaries?

The matched pairs experimentThe matched pairs experiment

Page 44: Inference about  Two Populations

44

• Solution (by hand)– The parameter tested is D (=1 – 2)– The hypotheses:

H0: D = 0H1: D > 0

– The t statistic:

Finance Marketing

ns

xt

D

DD

ns

xt

D

DD

The matched pairs hypothesis testThe matched pairs hypothesis test

Degrees of freedom = nD – 1

The rejection region is t > t.05,25-1 = 1.711

Page 45: Inference about  Two Populations

45

• Solution – From the data (Xm13-04) calculate:

GPA Group Finance Marketing1 95171 893292 88009 927053 98089 992054 106322 990035 74566 748256 87089 770387 88664 782728 71200 594629 69367 5155510 82618 81591

. .

. .

. .

Difference5842

-4696-11167319-259

100511039211738178121027

.

.

.

Difference

Mean 5065Standard Error 1329Median 3285Mode #N/AStandard Deviation 6647Sample Variance 44181217Kurtosis -0.6594Skewness 0.3597Range 23533Minimum -5721Maximum 17812Sum 126613Count 25

The matched pairs hypothesis testThe matched pairs hypothesis test

Page 46: Inference about  Two Populations

46

• Solution

– Calculate t

647,6

065,5

D

D

s

x

81.325664705065

nsx

tD

DD

The matched pairs hypothesis testThe matched pairs hypothesis test

Page 47: Inference about  Two Populations

47

3.81 > 1.7109

.0004 < .05

The matched pairs hypothesis testThe matched pairs hypothesis test

Xm13-04t-Test: Paired Two Sample for Means

Finance MarketingMean 65438 60374Variance 444981810 469441785Observations 25 25Pearson Correlation 0.9520Hypothesized Mean Difference 0df 24t Stat 3.81P(T<=t) one-tail 0.0004t Critical one-tail 1.7109P(T<=t) two-tail 0.0009t Critical two-tail 2.0639

Page 48: Inference about  Two Populations

48

Conclusion: There is sufficient evidence to infer at 5% significance level that the Finance MBAs’ highest salary offer is, on the average, higher than that ofthe Marketing MBAs.

The matched pairs hypothesis testThe matched pairs hypothesis test

Page 49: Inference about  Two Populations

49

The matched pairs mean difference estimation

The matched pairs mean difference estimation

744,2065,525

6647064.250654.13

%95

5.13

1,2/

isExamplein

differencemeantheofintervalconfidenceThe

Example

n

stx

ofEstimatorIntervalConfidence

nD

D

Page 50: Inference about  Two Populations

50

The matched pairs mean difference estimation

The matched pairs mean difference estimation

Using Data Analysis Plus Xm13-04GPA Group Finance Marketing

1 95171 893292 88009 927053 98089 992054 106322 990035 74566 748256 87089 770387 88664 782728 71200 594629 69367 5155510 82618 81591

. .

. .

. .

Difference5842

-4696-11167319-259

100511039211738178121027

.

.

.

First calculate the differences,then run the confidence interval procedure in Data Analysis Plus.

t-Estimate: Mean

DifferenceMean 5065Standard Deviation 6647LCL 2321UCL 7808

Page 51: Inference about  Two Populations

51

Checking the required conditionsfor the paired observations case

Checking the required conditionsfor the paired observations case

• The validity of the results depends on the normality of the differences.

Histogram

05

10

0 5000 10000 15000 20000

Difference

Fre

qu

en

cy

Page 52: Inference about  Two Populations

52

13.5 Inference about the ratio 13.5 Inference about the ratio of two variancesof two variances

13.5 Inference about the ratio 13.5 Inference about the ratio of two variancesof two variances

• In this section we draw inference about the ratio of two population variances.

• This question is interesting because:– Variances can be used to evaluate the consistency

of processes. – The relationship between population variances

determines which of the equal-variances or unequal-variances t-test and estimator of the difference between means should be applied

Page 53: Inference about  Two Populations

53

• Parameter to be tested is 12/2

2

• Statistic used is 22

22

21

21

ss

F

Parameter and Statistic Parameter and Statistic

• Sampling distribution of 12/2

2

– The statistic [s12/1

2] / [s22/2

2] follows the F distribution with 1 = n1 – 1, and 2 = n2 – 1.

Page 54: Inference about  Two Populations

54

– Our null hypothesis is always

H0: 12 / 2

2 = 1

– Under this null hypothesis the F statistic becomes

F =S1

2/12

S22/2

2

22

21

ss

F 22

21

ss

F

Parameter and Statistic Parameter and Statistic

Page 55: Inference about  Two Populations

55

Page 56: Inference about  Two Populations

56

(see Xm13-01)In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first.

Example 13.6 (revisiting Example 13.1)

Calories intake at lunch

The hypotheses are:

H0:

H1: 1

1

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Testing the ratio of two population variances Testing the ratio of two population variances

Page 57: Inference about  Two Populations

57

– The F statistic value is F=S12/S2

2 = .3845

– Conclusion: Because .3845<.58 we reject the null hypothesis in favor of the alternative hypothesis, and conclude that there is sufficient evidence at the 5% significance level that the population variances differ.

Testing the ratio of two population variances Testing the ratio of two population variances• Solving by hand

– The rejection region is

F>F2,1,2 or F<1/F

58.72.1

1111

61.1

40,120,025.42,106,025.1,2,2/

120,40,025.106,42,025.2,1,2/

FFFF

FFFF

Page 58: Inference about  Two Populations

58

(see Xm13-01)In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first.

The hypotheses are:

H0:

H1: 1

1

Example 13.6 (revisiting Example 13.1)

Testing the ratio of two population variances Testing the ratio of two population variances

F-Test Two-Sample for Variances

Consumers NonconsumersMean 604 633Variance 4103 10670Observations 43 107df 42 106F 0.3845P(F<=f) one-tail 0.0004F Critical one-tail 0.6371

Page 59: Inference about  Two Populations

59

Estimating the Ratio of Two Population Variances

Estimating the Ratio of Two Population Variances

• From the statistic F = [s12/1

2] / [s22/2

2] we can isolate 1

2/22 and build the following confidence

interval:

1nand1nwhere

Fs

sF

1

s

s

221

1,2,2/22

21

22

21

2,1,2/22

21

1nand1nwhere

Fs

sF

1

s

s

221

1,2,2/22

21

22

21

2,1,2/22

21

Page 60: Inference about  Two Populations

60

• Example 13.7– Determine the 95% confidence interval estimate of the ratio of

the two population variances in Example 13.1– Solution

• We find F/2,v1,v2 = F.025,40,120 = 1.61 (approximately)

F/2,v2,v1 = F.025,120,40 = 1.72 (approximately)

• LCL = (s12/s2

2)[1/ F/2,v1,v2 ]

= (4102.98/10,669.77)[1/1.61]= .2388

• UCL = (s12/s2

2)[ F/2,v2,v1 ]

= (4102.98/10,669.77)[1.72]= .6614

Estimating the Ratio of Two Population VariancesEstimating the Ratio of Two Population Variances

Page 61: Inference about  Two Populations

61

13.6 Inference about the difference between two population proportions13.6 Inference about the difference between two population proportions• In this section we deal with two populations whose data

are nominal.• For nominal data we compare the population

proportions of the occurrence of a certain event.• Examples

– Comparing the effectiveness of new drug versus older one– Comparing market share before and after advertising

campaign– Comparing defective rates between two machines

Page 62: Inference about  Two Populations

62

Parameter and StatisticParameter and Statistic

• Parameter– When the data are nominal, we can only count the

occurrences of a certain event in the two populations, and calculate proportions.

– The parameter is therefore p1 – p2.

• Statistic– An unbiased estimator of p1 – p2 is (the

difference between the sample proportions). 21 p̂p̂

Page 63: Inference about  Two Populations

63

Sample 1 Sample size n1

Number of successes x1

Sample proportion

Sample 1 Sample size n1

Number of successes x1

Sample proportion

• Two random samples are drawn from two populations.• The number of successes in each sample is recorded.• The sample proportions are computed.

Sample 2 Sample size n2

Number of successes x2

Sample proportion

Sample 2 Sample size n2

Number of successes x2

Sample proportionx

n1

1

ˆ p1

2

22 n

xp̂

Sampling Distribution ofSampling Distribution of 21 p̂p̂

Page 64: Inference about  Two Populations

64

• The statistic is approximately normally distributed if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all greater than or equal to 5.

• The mean of is p1 - p2.

• The variance of is (p1(1-p1) /n1)+ (p2(1-p2)/n2)

21 p̂p̂

21 p̂p̂

21 p̂p̂

Sampling distribution ofSampling distribution of 21 p̂p̂

Page 65: Inference about  Two Populations

65

2

22

1

11

2121

)1()1(

)()ˆˆ(

n

pp

n

pp

ppppZ

2

22

1

11

2121

)1()1(

)()ˆˆ(

n

pp

n

pp

ppppZ

The z-statisticThe z-statistic

Because and are unknown the standard errorBecause and are unknown the standard errormust be estimated using the sample proportions. must be estimated using the sample proportions. The method depends on the null hypothesis The method depends on the null hypothesis

1p 2p

Page 66: Inference about  Two Populations

66

Testing the p1 – p2 Testing the p1 – p2

• There are two cases to consider:Case 1: H0: p1-p2 =0

Calculate the pooled proportion

21

21

nn

xxp̂

Then Then

Case 2: H0: p1-p2 =D (D is not equal to 0)Do not pool the data

2

22 n

xp̂

1

11 n

xp̂

)n1

n1

)(p̂1(p̂

)pp()p̂p̂(Z

21

2121

)n1

n1

)(p̂1(p̂

)pp()p̂p̂(Z

21

2121

2

22

1

11

21

n)p̂1(p̂

n)p̂1(p̂

D)p̂p̂(Z

2

22

1

11

21

n)p̂1(p̂

n)p̂1(p̂

D)p̂p̂(Z

Page 67: Inference about  Two Populations

67

• Example 13.8– The marketing manager needs to decide which of

two new packaging designs to adopt, to help improve sales of his company’s soap.

– A study is performed in two supermarkets:• Brightly-colored packaging is distributed in supermarket 1.• Simple packaging is distributed in supermarket 2.

– First design is more expensive, therefore,to be financially viable it has to outsell the second design.

Testing p1 – p2 (Case 1) Testing p1 – p2 (Case 1)

Page 68: Inference about  Two Populations

68

• Summary of the experiment results– Supermarket 1 - 180 purchasers of Johnson Brothers

soap out of a total of 904

– Supermarket 2 - 155 purchasers of Johnson Brothers soap out of a total of 1,038

– Use 5% significance level and perform a test to find which type of packaging to use.

Testing p1 – p2 (Case 1) Testing p1 – p2 (Case 1)

Page 69: Inference about  Two Populations

69

• Solution– The problem objective is to compare the population

of sales of the two packaging designs.– The data are nominal (Johnson Brothers or other

soap) – The hypotheses are

H0: p1 - p2 = 0H1: p1 - p2 > 0

– We identify this application as case 1

Population 1: purchases at supermarket 1Population 2: purchases at supermarket 2

Testing p1 – p2 (Case 1) Testing p1 – p2 (Case 1)

Page 70: Inference about  Two Populations

70

Testing p1 – p2 (Case 1) Testing p1 – p2 (Case 1)• Compute: Manually

– For a 5% significance level the rejection region isz > z = z.05 = 1.645

1725.)038,1904()155180()()(ˆ 2121 nnxxp

isproportionpooledThe

90.2

038,1

1

904

1)1725.1(1725.

1493.1991.

11)ˆ1(ˆ

)()ˆˆ(

21

2121

nnpp

ppppZ

becomesstatisticzThe

1493.038,1155ˆ,1991.904180ˆ 21 pandp

aresproportionsampleThe

Page 71: Inference about  Two Populations

71

Testing p1 – p2 (Case 1) Testing p1 – p2 (Case 1)• Excel (Data Analysis Plus)

Conclusion: There is sufficient evidence to conclude at the 5% significance level, that brightly-colored design will outsell the simple design.

Xm13-08

z-Test: Two Proportions

Supermarket 1 Supermarket 2Sample Proportions 0.1991 0.1493Observations 904 1038Hypothesized Difference 0z Stat 2.90P(Z<=z) one tail 0.0019z Critical one-tail 1.6449P(Z<=z) two-tail 0.0038z Critical two-tail 1.96

Page 72: Inference about  Two Populations

72

• Example 13.9 (Revisit Example 13.8)– Management needs to decide which of two new

packaging designs to adopt, to help improve sales of a certain soap.

– A study is performed in two supermarkets:– For the brightly-colored design to be financially viable it

has to outsell the simple design by at least 3%.

Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)

Page 73: Inference about  Two Populations

73

• Summary of the experiment results– Supermarket 1 - 180 purchasers of Johnson Brothers’

soap out of a total of 904

– Supermarket 2 - 155 purchasers of Johnson Brothers’ soap out of a total of 1,038

– Use 5% significance level and perform a test to find which type of packaging to use.

Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)

Page 74: Inference about  Two Populations

74

• Solution– The hypotheses to test are

H0: p1 - p2 = .03H1: p1 - p2 > .03

– We identify this application as case 2 (the hypothesized difference is not equal to zero).

Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)

Page 75: Inference about  Two Populations

75

• Compute: Manually

The rejection region is z > z = z.05 = 1.645.Conclusion: Since 1.15 < 1.645 do not reject the null hypothesis. There is insufficient evidence to infer that the brightly-colored design will outsell the simple design by 3% or more.

Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)

15.1

038,1

)1493.1(1493.

904

)1991.1(1991.

03.038,1

155

904

180

)ˆ1(ˆ)ˆ1(ˆ

)ˆˆ(

2

22

1

11

21

n

pp

n

pp

DppZ

Page 76: Inference about  Two Populations

76

Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)• Using Excel (Data

Analysis Plus)Xm13-08

z-Test: Two Proportions

Supermarket 1 Supermarket 2Sample Proportions 0.1991 0.1493Observations 904 1038Hypothesized Difference 0.03z Stat 1.14P(Z<=z) one tail 0.1261z Critical one-tail 1.6449P(Z<=z) two-tail 0.2522z Critical two-tail 1.96

Page 77: Inference about  Two Populations

77

Estimating p1 – p2 Estimating p1 – p2

• Estimating the cost of life saved– Two drugs are used to treat heart attack victims:

• Streptokinase (available since 1959, costs $460)• t-PA (genetically engineered, costs $2900).

– The maker of t-PA claims that its drug outperforms Streptokinase.

– An experiment was conducted in 15 countries. • 20,500 patients were given t-PA• 20,500 patients were given Streptokinase• The number of deaths by heart attacks was recorded.

Page 78: Inference about  Two Populations

78

• Experiment results– A total of 1497 patients treated with Streptokinase

died.– A total of 1292 patients treated with t-PA died.

• Estimate the cost per life saved by using t-PA instead of Streptokinase.

Estimating p1 – p2 Estimating p1 – p2

Page 79: Inference about  Two Populations

79

• Solution– The problem objective: Compare the outcomes of

two treatments.– The data are nominal (a patient lived or died)– The parameter to be estimated is p1 – p2.

• p1 = death rate with t-PA

• p2 = death rate with Streptokinase

Estimating p1 – p2 Estimating p1 – p2

Page 80: Inference about  Two Populations

80

• Compute: Manually– Sample proportions:

– The 95% confidence interval estimate is

0630.205001292

p̂,0730.205001497

p̂ 21

2

22

1

1121 n

)p̂1(p̂n

)p̂1(p̂)p̂p̂(

2

22

1

1121 n

)p̂1(p̂n

)p̂1(p̂)p̂p̂(

0149.0051.

0049.0100.20500

)0630.1(0630.

20500

)0730.1(0730.96.10630.0730.

UCLLCL

Estimating p1 – p2 Estimating p1 – p2

Page 81: Inference about  Two Populations

81

• Interpretation– We estimate that between .51% and 1.49% more

heart attack victims will survive because of the use of t-PA.

– The difference in cost per life saved is 2900-460= $2440.

– The total cost saved by switching to t-PA is estimated to be between 2440/.0149 = $163,758 and 2440/.0051 = $478,431

Estimating p1 – p2 Estimating p1 – p2


Recommended