Post on 02-Jan-2016
transcript
1
Inference about Two Populations
Chapter 13
2
12.1 Introduction12.1 Introduction
• Variety of techniques are presented to compare two populations.
• We are interested in:– The difference between two means.– The ratio of two variances.– The difference between two proportions.
3
• Two random samples are drawn from the two populations of interest.
• Because we compare two population means, we use the statistic .
13.2 Inference about the Difference between Two Means: Independent Samples
13.2 Inference about the Difference between Two Means: Independent Samples
21 xx
4
1. is normally distributed if the (original) population distributions are normal .
2. is approximately normally distributed if the (original) population is not normal, but the samples’ size is sufficiently large (greater than 30).
3. The expected value of is 1 - 2
4. The variance of is 12/n1 + 2
2/n2
21 xx
21 xx
The Sampling Distribution ofThe Sampling Distribution of21
xx
21xx
21xx
5
• If the sampling distribution of is normal or approximately normal we can write:
• Z can be used to build a test statistic or a confidence interval for 1 - 2
21
21
nn
)()xx(Z
21
21
nn
)()xx(Z
21xx
Making an inference about –Making an inference about –
6
• Two cases are considered when producing the t-statistic.– The two unknown population variances are equal.– The two unknown population variances are not equal.
Making an inference about –
When Population Variances are UnknownMaking an inference about –
When Population Variances are Unknown
7
Inference about Inference about ––: Equal variances: Equal variances
2nns)1n(s)1n(
S21
2
22
2
112
p
2nn
s)1n(s)1n(S
21
2
22
2
112
p
Example: s12 = 25; s2
2 = 30; n1 = 10; n2 = 15. Then,
04347.2821510
)30)(115()25)(110(S2
p
• Calculate the pooled variance estimate by:
2pS
8
Inference about Inference about ––: Equal variances: Equal variances
• Construct the t-statistic as follows:
2nn.f.d
)n1
n1
(s
)()xx(t
21
21
2p
21
2nn.f.d
)n1
n1
(s
)()xx(t
21
21
2p
21
• Perform a hypothesis test H0: = 0 H1: > 0
or < 0 or 0
Build a confidence interval
level. confidence the is where
)n1
n1
(st)xx(21
2
p21
9
1
)(
1
)(
)/(d.f.
)(
)()(
2
22
22
1
21
21
22
221
21
2
22
1
21
21
n
ns
n
ns
nsns
n
s
n
s
xxt
1
)(
1
)(
)/(d.f.
)(
)()(
2
22
22
1
21
21
22
221
21
2
22
1
21
21
n
ns
n
ns
nsns
n
s
n
s
xxt
Inference about –: Unequal variancesInference about –: Unequal variances
10
Inference about –: Unequal variancesInference about –: Unequal variances
Conduct a hypothesis test as needed, or, build a confidence interval
level confidence the is where
n
s
n
s2txx
intervalConfidence
)2
22
1
21()21(
11
• Example 13.1– Do people who eat high-fiber cereal for
breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast?
– A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal.
– For each person the number of calories consumed at lunch was recorded.
Example: Making an inference about –
Example: Making an inference about –
12
Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748
. .
. .
. .
. .
Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748
. .
. .
. .
. .
Solution:
• The parameter to be tested is the difference between two means. • The claim to be tested is: The mean caloric intake of consumers (1) is less than that of non-consumers (2).
Example: Making an inference about –
Example: Making an inference about –
13
• The hypotheses are:
H0: (1 - 2) = 0H1: (1 - 2) < 0
– Are population variances equal?
We have s12= 4103, and s2
2 = 10,670.
– It appears that the variances are unequal.
Example: Making an inference about –
Example: Making an inference about –
14
• Compute: Manually– From the data we have:
1236.122
1107
10710670
143
434103
)10710670434103(
670,10,103,423.633,02.604
22
2
22
2121
ssxx
Example: Making an inference about –
Example: Making an inference about –
15
• Compute: Manually– The rejection region is t < -t = -t.05,123 1.658
-2.09
107
10670
43
4103
)0()23.63302.604(
)()(
2
22
1
21
21
n
s
n
s
xxt
Example: Making an inference about –
Example: Making an inference about –
16
56.1,86.5665.2721.29107
1067043
41039796.1)239.63302.604(
2n
22
s
1n
21
s
2t)
2x
1x(
• Compute: ManuallyThe confidence interval estimator for the differencebetween two means is
Example: Making an inference about –
Example: Making an inference about –
17
• Example 13.2– An ergonomic chair can be assembled using two
different sets of operations (Method A and Method B)
– The operations manager would like to know whether the assembly time under the two methods differ.
Example: Making an inference about –
Example: Making an inference about –
18
• Example 13.2– Two samples are randomly and independently selected
• A sample of 25 workers assembled the chair using method A.
• A sample of 25 workers assembled the chair using method B.
• The assembly times were recorded
– Do the assembly times of the two methods differs?
Example: Making an inference about –
Example: Making an inference about –
19
Example: Making an inference about –
Example: Making an inference about –
Method A Method B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .
Method A Method B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .
Assembly times in Minutes
Solution
• The parameter of interest is the difference between two population means.
• The claim to be tested is whether a difference between the two methods exists.
20
Example: Making an inference about –
Example: Making an inference about –• Compute: Manually
–The hypotheses test is:
H0: (1 - 2) 0 H1: (1 - 2) 0
– Are population variances equal?
– We have s12= 0.8478, and s2
2 =1.3031.
– The two population variances appear to be equal.
21
Example: Making an inference about –
Example: Making an inference about –• Compute: Manually
4822525.f.d
93.0
251
251
076.1
0)016.6288.6(t
4822525.f.d
93.0
251
251
076.1
0)016.6288.6(t
3031.1s 8478.0s 016.6x 288.6x 22
2121
076.122525
)303.1)(125()848.0)(125(S2
p
– To calculate the t-statistic we have:
22
• The rejection region is t < -t =-t.025,48 = -2.009 or t > t = t.025,48 = 2.009
• The test: Since t= -2.009 < 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis.
For = 0.05
2.009.093-2.009
Rejection regionRejection region
Example: Making an inference about –
Example: Making an inference about –
23
• Conclusion: There is no evidence to infer at the 5% significance level that the two assembly methods are different in terms of assembly time.
Example: Making an inference about –
Example: Making an inference about –
24
Example: Making an inference about –
Example: Making an inference about –
A 95% confidence interval for 1 - 2:
]8616.0,3176.0[5896.0272.0
)25
1
25
11.075(0106.2016.6288.6
)11
()(21
221
nnstxx p
Thus, at 95% confidence level -0.3176 < 1 - 2 < 0.8616
Notice: “Zero” is included in the confidence interval
25
13.4 Matched Pairs Experiment13.4 Matched Pairs Experiment
• What is a matched pair experiment?
• Why matched pairs experiments are needed? • How do we deal with data produced in this way?
26
The matched pairs experimentThe matched pairs experiment
• Note D = 1 – 2.
• This formulation has the benefit of a smaller variability.
Group 1 Group 2 Difference10 12 - 215 11 +4
Mean1 =12.5 Mean2 =11.5Mean1 – Mean2 = 1 Mean Differences = 1
27
• Example 13.4 – It was suspected that salary offers were affected by
students’ GPA.– To reduce this variability, the following procedure was
used:• 25 ranges of GPAs were predetermined.• Students from each major were randomly selected, one from
each GPA range.• The highest salary offer for each student was recorded.
– From the data presented can we conclude that Finance majors are offered higher salaries?
The matched pairs experimentThe matched pairs experiment
28
• Solution (by hand)– The parameter tested is D (=1 – 2)– The hypotheses:
H0: D = 0H1: D > 0
– The t statistic:
Finance Marketing
ns
xt
D
DD
ns
xt
D
DD
The matched pairs hypothesis testThe matched pairs hypothesis test
Degrees of freedom = nD – 1
The rejection region is t > t.05,25-1 = 1.711
29
• Solution
– Calculate t
647,6
065,5
D
D
s
x
81.325664705065
nsx
tD
DD
The matched pairs hypothesis testThe matched pairs hypothesis test
30
Conclusion: There is sufficient evidence to infer at 5% significance level that the Finance MBAs’ highest salary offer is, on the average, higher than that ofthe Marketing MBAs.
The matched pairs hypothesis testThe matched pairs hypothesis test
31
The matched pairs mean difference estimation
The matched pairs mean difference estimation
744,2065,525
6647064.250654.13
%95
5.13
1,2/
isExamplein
differencemeantheofintervalconfidenceThe
Example
n
stx
ofEstimatorIntervalConfidence
nD
D
32
13.5 Inference about the ratio 13.5 Inference about the ratio of two variancesof two variances
13.5 Inference about the ratio 13.5 Inference about the ratio of two variancesof two variances
• In this section we draw inference about the ratio of two population variances.
• This question is interesting because:It determines which of the equal-variances or unequal-
variances t-test should be applied
33
• Parameter to be tested is 12/2
2
• Statistic used is 22
22
21
21
ss
F
Parameter and Statistic Parameter and Statistic
• Sampling distribution of F: It follows the F distribution
with 1 = n1 – 1, and 2 = n2 – 1.
34
– Our null hypothesis is always
H0: 12 / 2
2 = 1
– Under this null hypothesis the F statistic becomes
F =S1
2/12
S22/2
2
22
21
ss
F 22
21
ss
F
Parameter and Statistic Parameter and Statistic
35
(see Xm13-01)In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first.
Example 13.6 (revisiting Example 13.1)
Calories intake at lunch
The hypotheses are:
H0:
H1: 1
1
Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748
. .
. .
. .
. .
Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748
. .
. .
. .
. .
Testing the ratio of two population variances Testing the ratio of two population variances
36
– The F statistic value is F=S12/S2
2 = .3845
– Conclusion: Because .3845<.58 we reject the null hypothesis and conclude that there is sufficient evidence at the 5% significance level that the population variances differ.
Testing the ratio of two population variances Testing the ratio of two population variances• Solving by hand
– The rejection region is
F>F2,1,2 or F<1/F
58.72.1
1111
61.1
40,120,025.42,106,025.1,2,2/
120,40,025.106,42,025.2,1,2/
FFFF
FFFF
37
(see Xm13-01)In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first.
The hypotheses are:
H0:
H1:
1
Example 13.6 (revisiting Example 13.1)
Testing the ratio of two population variances Testing the ratio of two population variances
F-Test Two-Sample for Variances
Consumers NonconsumersMean 604 633Variance 4103 10670Observations 43 107df 42 106F 0.3845P(F<=f) one-tail 0.0004F Critical one-tail 0.6371
38
13.6 Inference about the difference between two population proportions13.6 Inference about the difference between two population proportions• In this section we deal with two populations whose data
are nominal.• For nominal data we compare the population
proportions of the occurrence of a certain event.• Examples
– Comparing the effectiveness of new drug versus older one– Comparing market share before and after advertising
campaign– Comparing defective rates between two machines
39
Parameter and StatisticParameter and Statistic
• ParameterThe parameter is therefore p1 – p2.
• Statistic– An unbiased estimator of p1 – p2 is . 21 p̂p̂
40
Sample 1 Sample size n1
Number of successes x1
Sample proportion
Sample 1 Sample size n1
Number of successes x1
Sample proportion
• Two random samples are drawn from two populations.• The number of successes in each sample is recorded.• The sample proportions are computed.
Sample 2 Sample size n2
Number of successes x2
Sample proportion
Sample 2 Sample size n2
Number of successes x2
Sample proportionx
n1
1
ˆ p1
2
22 n
xp̂
Sampling Distribution ofSampling Distribution of 21 p̂p̂
41
• The statistic is approximately normally distributed if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all greater than or equal to 5.
• The mean of is p1 - p2.
• The variance of is (p1(1-p1) /n1)+ (p2(1-p2)/n2)
21 p̂p̂
21 p̂p̂
21 p̂p̂
Sampling distribution ofSampling distribution of 21 p̂p̂
42
2
22
1
11
2121
)1()1(
)()ˆˆ(
n
pp
n
pp
ppppZ
2
22
1
11
2121
)1()1(
)()ˆˆ(
n
pp
n
pp
ppppZ
The z-statisticThe z-statistic
Because and are unknown the standard errorBecause and are unknown the standard errormust be estimated using the sample proportions. must be estimated using the sample proportions. The method depends on the null hypothesis The method depends on the null hypothesis
1p 2p
43
Testing the p1 – p2 Testing the p1 – p2
• There are two cases to consider:Case 1: H0: p1-p2 =0
Calculate the pooled proportion
21
21
nn
xxp̂
Then Then
Case 2: H0: p1-p2 =D (D is not equal to 0)Do not pool the data
2
22 n
xp̂
1
11 n
xp̂
)n1
n1
)(p̂1(p̂
)pp()p̂p̂(Z
21
2121
)n1
n1
)(p̂1(p̂
)pp()p̂p̂(Z
21
2121
2
22
1
11
21
n)p̂1(p̂
n)p̂1(p̂
D)p̂p̂(Z
2
22
1
11
21
n)p̂1(p̂
n)p̂1(p̂
D)p̂p̂(Z
44
• Example 13.8– The marketing manager needs to decide which of
two new packaging designs to adopt, to help improve sales of his company’s soap.
– A study is performed in two supermarkets:• Brightly-colored packaging is distributed in supermarket 1.• Simple packaging is distributed in supermarket 2.
– First design is more expensive, therefore,to be financially viable it has to outsell the second design.
Testing p1 – p2 (Case 1) Testing p1 – p2 (Case 1)
45
• Summary of the experiment results– Supermarket 1 - 180 purchasers of Johnson Brothers
soap out of a total of 904
– Supermarket 2 - 155 purchasers of Johnson Brothers soap out of a total of 1,038
– Use 5% significance level and perform a test to find which type of packaging to use.
Testing p1 – p2 (Case 1) Testing p1 – p2 (Case 1)
46
• Solution– The problem objective is to compare the population
of sales of the two packaging designs.– The data are nominal (Johnson Brothers or other
soap) – The hypotheses are
H0: p1 - p2 = 0H1: p1 - p2 > 0
– We identify this application as case 1
Population 1: purchases at supermarket 1Population 2: purchases at supermarket 2
Testing p1 – p2 (Case 1) Testing p1 – p2 (Case 1)
47
Testing p1 – p2 (Case 1) Testing p1 – p2 (Case 1)• Compute: Manually
– For a 5% significance level the rejection region isz > z = z.05 = 1.645
1725.)038,1904()155180()()(ˆ 2121 nnxxp
isproportionpooledThe
90.2
038,1
1
904
1)1725.1(1725.
1493.1991.
11)ˆ1(ˆ
)()ˆˆ(
21
2121
nnpp
ppppZ
becomesstatisticzThe
1493.038,1155ˆ,1991.904180ˆ 21 pandp
aresproportionsampleThe
48
• Example 13.9 (Revisit Example 13.8)– Management needs to decide which of two new
packaging designs to adopt, to help improve sales of a certain soap.
– A study is performed in two supermarkets:– For the brightly-colored design to be financially viable it
has to outsell the simple design by at least 3%.
Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)
49
• Summary of the experiment results– Supermarket 1 - 180 purchasers of Johnson Brothers’
soap out of a total of 904
– Supermarket 2 - 155 purchasers of Johnson Brothers’ soap out of a total of 1,038
– Use 5% significance level and perform a test to find which type of packaging to use.
Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)
50
• Solution– The hypotheses to test are
H0: p1 - p2 = .03H1: p1 - p2 > .03
– We identify this application as case 2 (the hypothesized difference is not equal to zero).
Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)
51
• Compute: Manually
The rejection region is z > z = z.05 = 1.645.Conclusion: Since 1.15 < 1.645 do not reject the null hypothesis. There is insufficient evidence to infer that the brightly-colored design will outsell the simple design by 3% or more.
Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)
15.1
038,1
)1493.1(1493.
904
)1991.1(1991.
03.038,1
155
904
180
)ˆ1(ˆ)ˆ1(ˆ
)ˆˆ(
2
22
1
11
21
n
pp
n
pp
DppZ
52
Find the 95% C.I. for p1 – p2 Find the 95% C.I. for p1 – p2
2
22
1
1121 n
)p̂1(p̂n
)p̂1(p̂)p̂p̂(
2
22
1
1121 n
)p̂1(p̂n
)p̂1(p̂)p̂p̂(