Date post: | 22-May-2018 |
Category: |
Documents |
Upload: | doankhuong |
View: | 219 times |
Download: | 3 times |
1
Chapter 10: Estimation and Hypothesis Testing Two Populations
10.1 Inferences about the difference between two population means for independent samples: σ1 and σ2 known
10.2 Inferences about the difference between two population means for independent samples: σ1 and σ2 unknown but equal
10.3 Inferences about the difference between two population means for independent samples: σ1 and σ2 unknown and unequal
10.4 Inferences about the difference between two population means for paired samples
10.5 Inferences about the difference between two population proportions for large and independent samples
Dr. Yingfu (Frank) Li10-1STAT 3308
Introduction
Statistical inference – estimation & hypothesis test
Inference about the difference between two population means Independent two samples
Known variances (10.1)
Unknown variances Equal variances (10.2)
Unequal variances (10.3)
Paired two samples (10.4)
Inference about the difference between two population proportion (10.5)
What do we need first for statistical inference? A statistic with known distribution – exact same idea as one sample case
Dr. Yingfu (Frank) Li10-2STAT 3308
2
10.1 Inference about 1 - 2 for independent samples: 1 & 2 known
Independent versus Dependent Samples
Mean, Standard Deviation, and Sampling Distribution of
X1 – X2
Interval Estimation of μ1 – μ2
Hypothesis Testing About μ1 – μ2
Dr. Yingfu (Frank) Li10-3STAT 3308
Independent versus Dependent Samples
Two samples drawn from two populations are independent if the selection of one sample from one population does not affect the selection of the second sample from the second population. Otherwise, the samples are dependent.
Examples 10-1 & 10-2 Suppose we want to estimate the difference between the mean salaries of all
male and all female executives. To do so, we draw two samples, one from the population of male executives and another from the population of female executives. These two samples are independent because they are drawn from two different populations, and the samples have no effect on each other.
Suppose we want to estimate the difference between the mean weights of all participants before and after a weight loss program. To accomplish this, suppose we take a sample of 40 participants and measure their weights before and after the completion of this program. Note that these two samples include the same 40 participants. This is an example of two dependentsamples. Such samples are also called paired or matched samples.
Dr. Yingfu (Frank) Li10-4STAT 3308
3
Mean, Standard Deviation, and Sampling Distribution of X1 – X2
Three conditions1. The two samples are independent
2. The standard deviations σ1 and σ2 of the two populations are known
3. At least one of the following conditions is fulfilled: Both samples are large (i..e., n1 ≥ 30 and n2 ≥ 30)
Or both populations from which the samples are drawn are normally distributed
Then the sampling distribution of X1 - X2 is (approximately) normally distributed First sample
Second sample
Difference of ind. Samples Mean
Standard deviation
Dr. Yingfu (Frank) Li10-5
1 2 1 2x x
1 2
2 21 2
1 2X X n n
STAT 3308
Sampling Distribution of
1. Both independent samples are from normal distributionsor
2. Both independent samples are large samples
1 2 1 2x x
Dr. Yingfu (Frank) Li10-6STAT 3308
4
Confidence Interval for μ1 – μ2
Recall for one sample case, (1 – ) 100% confidence interval for is
where is point estimate and is standard error
Same idea to get confidence interval for 1 – 2
Point estimate of 1 - 2 :
Critical value z/2 such that P(Z z/2) = 1 - /2
Margin of error
Confidence interval: E
In the quiz or test, I will check if you follow these four steps
Dr. Yingfu (Frank) Li10-7
xx zx x
1 2 x x
1 2 x x
STAT 3308
Example 10-3
According to Kaiser Family Foundation survey in 2014 and 2012, the average premium for employer-sponsored health insurance for family coverage was $16,834 in 2014 and $15,745 in 2012 (www.kff.org). Suppose that these averages were based on random samples of 250 and 200 employees who had such employer-sponsored health insurance plans for 2014 and 2012, respectively. Further assume that the population standard deviations for 2014 and 2012 were $2160 and $1990, respectively.
Let μ1 and μ2 be the respective population means for such annual premiums for the years 2014 and 2012, respectively. What is the point estimate of μ1 – μ2?
Construct a 97% confidence interval for μ1 – μ2.
Dr. Yingfu (Frank) Li10-8STAT 3308
5
Example 10-3: Solution
Point estimate of μ1 – μ2 = = $16,834 - $15,745
= $1089
97% confidence interval for μ1 – μ2
Point estimate
Critical value z such that P(Z < z) = 0.985. So z = 2.17
Margin of error E = z*
So E = 2.17 196.1196064 = 425.58 Thus, 97% confidence interval for μ1 – μ2 is 1303 425.58
Dr. Yingfu (Frank) Li10-9
1 2x x
1 2
2 2 2 2
1 2
1 2
(2160) (1990)
250 200
$196.1196064
x xn n
STAT 3308
Hypothesis Testing About μ1 – μ2
Testing an alternative hypothesis that the means of two populations are different is equivalent to μ1 ≠ μ2, which is the same as μ1 - μ2 ≠ 0.
Testing an alternative hypothesis that the mean of the first population is greater than the mean of the second population is equivalent to μ1 > μ2, which is the same as μ1 - μ2 > 0.
Testing an alternative hypothesis that the mean of the first population is less than the mean of the second population is equivalent to μ1 < μ2, which is the same as μ1 - μ2 < 0.
What type of tests? Left-tailed, right-tailed or two-tailed
Treat μ1 - μ2 as one parameter and use the same idea in chapter 9
Dr. Yingfu (Frank) Li10-10STAT 3308
6
Recall the hypothesis test for one sample case Set up hypotheses H0 & H1
Identify the type of test: left, right, or two-tailed
Pick a distribution & find test statistic z* = Find p-value or critical value z for rejection region Draw conclusion
H. T. about H1: 1 - 2 >, <, ≠ 0 (R-, L-, 2-tailed) Set up hypotheses H0 & H1
Identify the type of test: left, right, or 2-tailed
Pick a distribution & find test statistic z* = Find p-value or critical value z for rejection region Draw conclusion
Hypothesis Testing about μ1 – μ2
Dr. Yingfu (Frank) Li10-11STAT 3308
Example 10-4
Refer to Example 10-3 about the average annual premiums for employer-sponsored health insurance for family coverage in 2014 and 2012. Test at the 1% significance level whether the population means for the two years are different.
Solution Hypotheses: H0: μ1 – μ2 = 0 v.s. H1: μ1 – μ2 ≠ 0. Two-tailed test
Z & Test stat
P-value = 2*P(Z > 5.55) ≈ 0 or critical value = ±2.58
Since p-value ≈ 0 < 0.01 = α or test stat 5.55 is in the rejection region, go ahead to reject H0, which indicates…
Dr. Yingfu (Frank) Li10-12
1 2
1 2
2 22 2
1 2
1 2
1 2 1 2
(2160) (1990)$196.1196064
250 200
(16,834 15 ) 05
,74.55
196.119 0 4
5
6 6
( ) ( )
x x
x x
n n
x xz
STAT 3308
7
10.2 Inference about 1 - 2 for independent samples: 1 & 2 unknown but equal
Interval Estimation of μ1 – μ2
Hypothesis Testing About μ1 – μ2
Pooled standard deviation for two samples is computed as
where n1 and n2 are the sizes of the two samples and and
are the variances of the two samples, respectively. Here is an estimator of σ = σ1 = σ2.
Estimator of the Standard Deviation of x1 – x2 is
Dr. Yingfu (Frank) Li10-13
2 21 1 2 2
1 2
( 1) ( 1)
2p
n s n ss
n n
ps
21s
22s
1 2
1 2
1 1x x ps s
n nSTAT 3308
If 1 & 2 are unknown, but 1 = 2= σ, then we can prove under conditions that
where
Exact same idea as one sample case to obtain Confidence interval (point estimate tcritical std error)
Hypothesis testing (t* = )
When the sample sizes are large, Tn1 + n2 - 2 ≈ Z – standard normal
Inference about 1 - 2 for independent samples: 1 & 2 unknown but equal
Dr. Yingfu (Frank) Li10-14
1 2
1 2
( - ) 1 1
px x t sn n
STAT 3308
8
Example 10-5
A consumer agency wanted to estimate the difference in the mean amounts of caffeine in two brands of coffee. The agency took a sample of 15 one-pound jars of Brand I coffee that showed the mean amount of caffeine in these jars to be 80 milligrams per jar with a standard deviation of 5 milligrams. Another sample of 12 one-pound jars of Brand II coffee gave a mean amount of caffeine equal to 77 milligram per jar with a standard deviation of 6 milligrams.
Construct a 95% confidence interval for the difference between the mean amounts of caffeine in one-pound jars of these two brands of coffee. Assume that the two populations are normally distributed and that the standard deviations of the two populations are equal.
Dr. Yingfu (Frank) Li10-15STAT 3308
Example 10-5: Solution
Estimated pooled standard deviation
95% confidence interval for μ1 – μ2
Point estimate of μ1 – μ2 is 80 – 77 = 3
Critical value from T15+12-2 , T25 is 2.060. (little right tail = 0.025)
Margin of error E = = 2.060 * 2.11 = 4.36
95% confidence interval for μ1 – μ2 is 3 ± 4.36
Dr. Yingfu (Frank) Li10-16
1 2
2 2 2 2
1 1 2 2
1 2
1 2
( 1) ( 1) (15 1)(5) (12 1)(6)
2 15 12 2
5.46260011
1 1 1 1(5.46260011) 2.11565593
15 12
p
x x p
n s n ss
n n
s sn n
1 2x xt s
STAT 3308
9
Example 10-6
A sample of 14 cans of Brand I diet soda gave the mean number of calories of 23 per can with a standard deviation of 3 calories. Another sample of 16 cans of Brand II diet soda gave the mean number of calories of 25 per can with a standard deviation of 4 calories.
At the 1% significance level, can you conclude that the mean number of calories per can are different for these two brands of diet soda? Assume that the calories per can of diet soda are normally distributed for each of the two brands and that the standard deviations for the two populations are equal.
Dr. Yingfu (Frank) Li10-17STAT 3308
Example 10-6: Solution
Hypotheses: H0: μ1 – μ2 = 0 v.s. H1: μ1 – μ2 ≠ 0 right-tailed test
Calculate test stat
P-value = 2*P(T28 < -1.531) = 2*0.0685=0.1372 Critical value for rejection region: |t| > 2.763
Since p-value = 0.1372 > α = 0.01, so we fail to reject H0.
Dr. Yingfu (Frank) Li10-18
1 2
1 2
2 2 2 2
1 1 2 2
1 2
1 2
1 2 1 2
( 1) ( 1) (14 1)(3) (16 1)(4)3.57071421
2 14 16 2
1 1 1 1(3.57071421) 1.30674760
14 16
( ) ( ) (23 25) 01.531
1.30674760
p
x x p
x x
n s n ss
n n
s sn n
x xt
s
STAT 3308
10
Example 10-7
A sample of 40 children from New York State showed that the mean time they spend watching television is 28.50 hours per week with a standard deviation of 4 hours. Another sample of 35 children from California showed that the mean time spent by them watching television is 23.25 hours per week with a standard deviation of 5 hours.
Using a 2.5% significance level, can you conclude that the mean time spent watching television by children in New York State is greater than that for children in California? Assume that the standard deviations for the two populations are equal.
Dr. Yingfu (Frank) Li10-19STAT 3308
Example 10-7: Solution
H0: μ1 – μ2 = 0 v.s. H1: μ1 – μ2 > 0, right-tailed test
Test stat
P-value ≈ P(Z > 5.048) ≈ 0 (2.2*10-7). If use T-distribution, then p-value ≈ P(T73 > 5.048) = 1.58*10-6
Critical value from Z is 1.96 and from T73 is 1.993
Conclusion?
Dr. Yingfu (Frank) Li10-20
1 2
1 2
2 2 2 21 1 2 2
1 2
1 2
1 2 1 2
( 1) ( 1) (40 1)(4) (35 1)(5)4.49352655
2 40 35 2
1 1 1 1(4.49352655) 1.04004930
40 35
( ) ( ) (28.50 23.25) 05.048
1.04004930
p
x x p
x x
n s n ss
n n
s sn n
x xz
s
STAT 3308
11
If 1 & 2 are unknown, but 1 ≠ 2, then we can prove under conditions that
where
and
After obtaining the distribution, we apply exact the same idea to get C.I. and do H.T. for 1 - 2
C.I. for 1 - 2: point estimate tcriticl std_error
Hypothesis testing for 1 - 2: the observed test statistic
10.3 Inference about 1 - 2 for independent samples: 1 & 2 unknown and unequal
Dr. Yingfu (Frank) Li10-21STAT 3308
Examples 10 – 8 & 9
Example 10 – 8According to Example 10-5 of Section 10.2.1, a sample of 15 one-pound jars of coffee of Brand I showed that the mean amount of caffeine in these jars is 80 milligrams per jar with a standard deviation of 5 milligrams. Another sample of 12 one-pound coffee jars of Brand II gave a mean amount of caffeine equal to 77 milligrams per jar with a standard deviation of 6 milligrams. Construct a 95% confidence interval for the difference between the mean amounts of caffeine in one-pound coffee jars of these two brands. Assume that the two populations are normally distributed and that the standard deviations of the two populations are not equal.
Example 10 – 9According to Example 10-6 of Section 10.2.2, a sample of 14 cans of Brand I diet soda gave the mean number of calories per can of 23 with a standard deviation of 3 calories. Another sample of 16 cans of Brand II diet soda gave the mean number of calories of 25 per can with a standard deviation of 4 calories. Test at the 1% significance level whether the mean numbers of calories per can of diet soda are different for these two brands. Assume that the calories per can of diet soda are normally distributed for each of these two brands and that the standard deviations for the two populations are not equal.
Dr. Yingfu (Frank) Li10-22STAT 3308
12
10.4 Inference about 1 - 2 for Paired Samples
Paired or matched samples Every pair of values from two samples are from the same subject
Attention to the difference
Examples Weight loss for members of a health club
Drug for reducing cholesterol of a group of patients
Essential inference is on the difference di = Xi – Yi, i = 1, 2, …, n;
notation: d, d, , sd, n
Reduce the problem to one same as one sample case
Distribution: for n 30 or normal
Dr. Yingfu (Frank) Li10-23STAT 3308
Example 10-10
A researcher wanted to find the effect of a special diet on systolic blood pressure. She selected a sample of seven adults and put them on this dietary plan for 3 months. The following table gives the systolic blood pressures (in mm Hg) of these seven adults before and after the completion of this plan.
Let μd be the mean reduction in the systolic blood pressure due to this special dietary plan for the population of all adults. Construct a 95% confidence interval for μd. Assume that the population of paired differences is (approximately) normally distributed.
Dr. Yingfu (Frank) Li10-24STAT 3308
13
Example 10-10: Solution
Dr. Yingfu (Frank) Li10-25
1. d_bar = 5 2. t = 2.4473. E = tsd /√n =9.98 4. C.I. for d
STAT 3308
Example 10-11
A company wanted to know if attending a course on “how to be a successful salesperson” can increase the average sales of its employees. The company sent six of its salespersons to attend this course. The following table gives the 1-week sales of these salespersons before and after they attended this course.
Using the 1% significance level, can you conclude that the mean weekly sales for all salespersons increase as a result of attending this course? Assume that the population of paired differences has a normal distribution.
Dr. Yingfu (Frank) Li10-26STAT 3308
14
Example 10-11: Solution
Dr. Yingfu (Frank) Li10-27
1. H0: d 0, H1: d < 0. left-tailed test.2. t = – 3.873. p-value = P(T5 –3.87) = 0.006 or critical value from T5 is -3.3654. p-value < = 0.01, reject H0
–3.87
STAT 3308
Refer to Example 10–10. The table that gives the blood pressures of seven adults before and after the completion of a special dietary plan is reproduced here. Let μd be the mean of the differences between the systolic blood pressures before and after completing this special dietary plan for the population of all adults. Using the 5% significance level, can we conclude that the mean of the paired differences μd
is different from zero? Assume that the population of paired differences is (approximately) normally distributed.
1. H0: d = 0, H1: d ≠ 0. two-tailed test.2. t = 1.2263. p-value = 2P(T6 > 1.226) = 0.2664. p-value > = 0.05, fail reject H0
Example 10-12
5 10.7858dd s
Dr. Yingfu (Frank) Li10-28STAT 3308
15
Recall inference about population proportion p Use distribution
C.I. & H.T. for p
Now inference about p1 – p2 for two independent large samples Find distribution first
Exact same idea for inference Confidence interval
Hypothesis testing
Testing p1 – p2 = 0:
10.5 Inference about p1 – p2 for large and independent samples
Dr. Yingfu (Frank) Li10-29
* 1 2 1 2
1 2
1
ˆ ˆ 0, where & 1-
1 1( )
2
p p x xz p q p
n npq
n n
STAT 3308
Example 10 – 13
A researcher wanted to estimate the difference between the percentages of users of two toothpastes who will never switch to another toothpaste. In a sample of 500 users of Toothpaste A taken by this researcher, 100 said that they will never switch to another toothpaste. In another sample of 400 users of Toothpaste B taken by the same researcher, 68 said that they will never switch to another toothpaste. Construct a 97% confidence interval for the difference between the proportions of all users of the two toothpastes who will never switch.
Summary statistics:n1 = 500 and x1 = 100; n2 = 400 and x2 = 68
1. Point estimate of p1–p2: 100/500–68/400= 0.2 – 0.17 = 0.03
2. 97% leads to z = 2.173. Margin of error:
E = 2.170.02593742 = 0.0564. C.I. for p1 – p2: 0.03 0.056
Dr. Yingfu (Frank) Li10-30STAT 3308
16
Example 10 – 14
Reconsider Example 10–13 about the percentages of users of two toothpastes who will never switch to another toothpaste. At the 1% significance level, can we conclude that the proportion of users of Toothpaste A who will never switch to another toothpaste is higher than the proportion of users of Toothpaste B who will never switch to another toothpaste?
Summary statistics:n1 = 500 & x1 = 100; n2 = 400 & x2 = 68 1. H0: p1 – p2 0, H1: p1 – p2 > 0
Right-tailed test2. Under H0, p1 = p2, but = ?
So the observed test statistic
z* =
3. P-value = P(Z > 1.15) = 0.12514. Since p-value > = 0.01, fail to
reject H0
1.15
Dr. Yingfu (Frank) Li10-31STAT 3308
Example 10-15
The International Organization of Motor Vehicle Manufacturers (OICA), a group that defends the interests of the vehicle manufacturers, released the findings of a study it conducted during 2015 that included responses to a question that asked people if they can live without a car (USA TODAY, September 17, 2015). Suppose recently a sample of 1000 adults from California and another sample of 1200 adults from New York State were selected and these adults were asked if they can live without a vehicle. Sixty-four percent of these adults from California and 61% from New York State said that they cannot live without a vehicle. Test if the proportion of adults from California is different from the proportion of adults from New York State who will say that they cannot live without a vehicle. Use a 1% significance level.
Solution
Dr. Yingfu (Frank) Li10-32
1 2
1 1 2 2ˆ ˆ
1 2
ˆ ˆ0.624 0.02073991p p
n p n pp s
n n
1. H0: p1 – p2 = 0 v.s. H1: p1 – p2 ≠ 0 2. z* = (0.64 – 0.61)/0.02073991 = 1.453. P-value = 2*P(Z > 1.45) =0.1470, or rejection region |z| > 2.584. conclusion
STAT 3308
17
Technology Instruction
TI-83 & 84
Excel
Minitab?
Dr. Yingfu (Frank) Li10-33STAT 3308
Summary
Discussed confidence intervals and hypothesis testing for two populations parameters in different cases
Key ideas are the same as Chapter 9, i.e., obtaining the distribution of a statistic, which is often used to estimate population parameter(s)
Computation seems complicated, but it is doable. In the tests or quizzes, I will adjust the numbers and make sure the calculation is minimum
Dr. Yingfu (Frank) Li10-34STAT 3308