Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | carmel-ellen-mason |
View: | 232 times |
Download: | 6 times |
Introduction to Probability Introduction to Probability and Statisticsand Statistics
Thirteenth Edition Thirteenth Edition
Chapter 8
Large and Small-Sample Estimation
Large sample estimationLarge sample estimation
Population(Size of population = N)
Sample number 1
Sample number 2
Sample number 3
Sample number NCn
Each sample size = n
Populations are described by their probability distributions and parameters. For quantitative populations, the
location and shape are described by and .
For a binomial populations, the location and shape are determined by p.
If the values of parameters are unknown, we make inferences about them using sample information.
Types of Inference
• Estimation:Estimation:– Estimating or predicting the value of the parameter– “What is (are) the most likely values of m or p?”
• Hypothesis Testing:Hypothesis Testing: – Deciding about the value of a parameter based on
some preconceived idea.– “Did the sample come from a population with = 5
or p = 0.2?”
• Examples:Examples:– A consumer wants to estimate the average
price of similar homes in her city before putting her home on the market.
Estimation:Estimation: Estimate , the average home price.Estimation:Estimation: Estimate , the average home price.
Hypothesis testHypothesis test: Is the new average resistance, equal to the old average resistance,
Hypothesis testHypothesis test: Is the new average resistance, equal to the old average resistance,
– A manufacturer wants to know if a new type of steel is more resistant to high temperatures than an old type was.
Types of Inference
• Whether you are estimating parameters or testing hypotheses, statistical methods are important because they provide:– Methods for making the inference– A numerical measure of the goodness or
reliability of the inference
Types of Inference
An unknown population proportion p
An unknown population mean
?
p?
An estimatorestimator is a rule, usually a formula, that tells you how to calculate the estimate based on the sample. Point estimation:Point estimation: A single number is
calculated to estimate the parameter. Interval estimation:Interval estimation: Two numbers are
calculated to create an interval within which the parameter is expected to lie.
A. Point Estimators
Since an estimator is calculated from sample values, it varies from sample to sample according to its samplingsampling distributiondistribution..
An estimatorestimator is unbiasedunbiased if the mean of its sampling distribution equals the parameter of interest. It does not systematically overestimate or underestimate the target parameter.
Properties
Of all the unbiasedunbiased estimators, we prefer the estimator whose sampling distribution has the smallest spreadsmallest spread or variabilityvariability.
Properties
Measuring the Goodnessof an Estimator
o The distance between an estimate and the true value of the parameter is the error of error of estimation.estimation.
The distance between the bullet and the bull’s-eye.
The distance between the bullet and the bull’s-eye.
o When the sample sizes are large, our unbiasedunbiased estimators will have normal normal distributions.
Because of the Central Limit Theorem.
Because of the Central Limit Theorem.
The Margin of Error
estimator theoferror std2 z estimator theoferror std2 z
Margin of error: Margin of error: The maximum error of estimation, is the maximum likely difference
observed between sample mean x and true population mean µ, calculated as :
1.645 1.96 2.33 2.575
Margin of Error is the maximum likely difference observed between sample mean x and true population
mean µ.
denoted by E
µ x + Ex - E
x -E < µ < x +Elower limit
Definition
upper limit
Definition Margin of Error
µ x + Ex - E
also called the maximum error of the estimate
E = z/2 •n
n
szn
xμ
2 :)30(error ofMargin
:mean population ofestimator Point
n
szn
xμ
2 :)30(error ofMargin
:mean population ofestimator Point
Estimating Means and Proportions
•For a quantitative population,
•For a binomial population,
n
qpzn
x/npp
ˆˆ :)30(error ofMargin
ˆ : proportion population ofestimator Point
2
n
qpzn
x/npp
ˆˆ :)30(error ofMargin
ˆ : proportion population ofestimator Point
2
1.6451.962.332.575
SE
Example
• A homeowner randomly samples 64 homes similar to her own and finds that the average selling price is $252,000 with a standard deviation of $15,000. Estimate the average selling price for all similar homes in the city.
Point estimator of : 252,000
15,000Margin of error : 1.96 1.96 3675
64
μ x
s
n
Point estimator of : 252,000
15,000Margin of error : 1.96 1.96 3675
64
μ x
s
n
A quality control technician wants to estimate the proportion of soda cans that are underfilled. He randomly samples 200 cans of soda and finds 10 underfilled cans.
03.200
)95)(.05(.96.1
ˆˆ96.1
05.200/10ˆ
200
n
qp
x/npp
pn
:error of Margin
: ofestimator Point
cans dunderfille of proportion
03.200
)95)(.05(.96.1
ˆˆ96.1
05.200/10ˆ
200
n
qp
x/npp
pn
:error of Margin
: ofestimator Point
cans dunderfille of proportion
Example
Example
A random sample of n = 500 observations from a binomial population produced x = 450 successes. Estimate the binomial proportion p and calculate the 90% margin of error
Example
• Create an interval (a, b) so that you are fairly sure that the parameter lies between these two values.
• “Fairly sure” is means “with high probability”, measured using the confidence coefficient, 1-confidence coefficient, 1-..
Usually, 1-Usually, 1-• Suppose 1- = .95
and that the estimator has a normal distribution.
Parameter 1.96SEParameter 1.96SE
• Since we don’t know the value of the parameter, consider which has a variable center.
• Only if the estimator falls in the tail areas will the interval fail to enclose the parameter. This happens only 5% of the time.
Estimator 1.96 SEEstimator 1.96 SE
WorkedWorkedWorked
Failed
TO CHANGE THE CONFIDENCE LEVEL
• To change to a general confidence level, 1-, pick a value of z that puts area 1- in the center of the z distribution.
100(1-)% Confidence Interval: Estimator zSE100(1-)% Confidence Interval: Estimator zSE
Tail area z/2
.05 1.645
.025 1.96
.01 2.33
.005 2.575
1. CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS
•For a quantitative population,
n
szx
μ
2/
:mean population afor interval Confidence
n
szx
μ
2/
:mean population afor interval Confidence
•For a binomial population,
n
qpzp
p
ˆˆˆ
: proportion population afor interval Confidence
2/n
qpzp
p
ˆˆˆ
: proportion population afor interval Confidence
2/
1.96
• A random sample of n = 50 males showed a mean average daily intake of dairy products equal to 756 grams with a standard deviation of 35 grams. Find a 95% confidence interval for the population average .
n
szx 205.0
50
3596.17 56 70.97 56
grams. 65.70 746.30or 7
1.96
• Find a 99% confidence interval for , the population average daily intake of dairy products for men.
n
szx 201.0
50
3558.27 56 75.12567
grams. 75.687 743.25or
The interval must be wider to provide for the increased confidence that is does indeed enclose the true value of .
2.575
grams. 65.70 746.30or 7
• Of a random sample of n = 150 college students, 104 of the students said that they had played on a soccer team during their K-12 years. Estimate the proportion of college students who played soccer in their youth with a 98% confidence interval.
n
qpzp
ˆˆˆ 202.0
150
)31(.69.33.2
104
150
09.. 69 .60or .78. p
2.33
2. ESTIMATING THE DIFFERENCE BETWEEN TWO MEANS
Sometimes we are interested in comparing the means of two populations.
•The average growth of plants fed using two different nutrients.•The average scores for students taught with two different teaching methods.
To make this comparison,
. varianceand mean with 1 population
fromdrawn size of sample randomA 211
1
μ
n
. varianceand mean with 1 population
fromdrawn size of sample randomA 211
1
μ
n
. varianceand mean with 2 population
fromdrawn size of sample randomA 222
2
μ
n
. varianceand mean with 2 population
fromdrawn size of sample randomA 222
2
μ
n
•We compare the two averages by making inferences about 1-2, the difference in the two population averages.
•If the two population averages are the same, then 1-2 = 0.•The best estimate of 1-2 is the difference in the two sample means,
21 xx 21 xx
ESTIMATING THE DIFFERENCE BETWEEN TWO MEANS (CONT’D)
THE SAMPLING DISTRIBUTION OF 1 2x x Properties of the Sampling Distribution of
Expected Value
Standard Deviation/Standard Error
where: 1 = standard deviation of population 1
2 = standard deviation of population 2
n1 = sample size from population 1
n2 = sample size from population 2
2
22
1
21
21 nnxx
1 2x x
INTERVAL ESTIMATE OF 1 - 2:LARGE-SAMPLE CASE (n1 > 30 AND n2 > 30)
Interval Estimate with 1 and 2 Known
where: 1 - is the confidence coefficient
Interval Estimate with 1 and 2 Unknown
where:
21221 xxzxx
21221 xxszxx
2
22
1
21
21 n
s
n
ss xx
SE
2
22
1
21
21 nnxx
EXAMPLE
• Compare the average daily intake of dairy products of men and women using a 95% confidence interval.
78.126
.78.6 18.78-or 21
Avg Daily Intakes Men Women
Sample size 50 50
Sample mean 756 762
Sample Std Dev 35 30
2
22
1
21
205.021 )(n
s
n
szxx
2 235 30(756 762) 1.96
50 50
1.96
• Could you conclude, based on this confidence interval, that there is a difference in the average daily intake of dairy products for men and women?
• The confidence interval contains the value 11--22= = 00.. Therefore, it is possible that 11 = = 2. 2. You would not want to conclude that there is a difference in average daily intake of dairy products for men and women.
78.6 18.78- 21 78.6 18.78- 21
EXAMPLE (CONT’D)
3. Estimating the Differencebetween Two Proportions
Sometimes we are interested in comparing the proportion of “successes” in two binomial populations. •The germination rates of untreated seeds and seeds treated with a fungicide.•The proportion of male and female voters who favor a particular candidate for governor.
To make this comparison,
.parameter with 1 population binomial
fromdrawn size of sample randomA
1
1
p
n.parameter with 1 population binomial
fromdrawn size of sample randomA
1
1
p
n
.parameter with 2 population binomial
fromdrawn size of sample randomA
2
2
p
n.parameter with 2 population binomial
fromdrawn size of sample randomA
2
2
p
n
•We compare the two proportions by making inferences about p1-p2, the difference in the two population proportions.
•If the two population proportions are the same, then p1-p2 = 0.
•The best estimate of p1-p2 is the difference in the two sample proportions,
2
2
1
121 ˆˆ
n
x
n
xpp
2
2
1
121 ˆˆ
n
x
n
xpp
Estimating the Difference betweenTwo Proportions (cont’d)
The Sampling Distribution of 1 2ˆ ˆp p
• Expected Value/mean
• Standard Deviation/Standard Error
• Distribution FormIf the sample sizes are large (n1p1, n1q1, n2p2, n2q2) are all greater than to 5), the sampling distribution of can be approximated by a normal probability distribution.
2
22
1
11ˆˆ 21 n
qp
n
qppp
1 2ˆ ˆp p
Interval Estimate of p1 - p2:Large-Sample Case
Example
• Compare the proportion of male and female college students who said that they had played on a soccer team during their K-12 years using a 99% confidence interval.
2
22
1
11201.021
ˆˆˆˆ)ˆˆ(
n
qp
n
qpzpp
70
)44(.56.
80
)19(.81.575.2)
70
39
80
65( 19.062.0
45.0 0.07or 21 pp
Youth Soccer Male Female
Sample size 80 70
Played soccer 65 39
2.575
• Could you conclude, based on this confidence interval, that there is a difference in the proportion of male and female college students who said that they had played on a soccer team during their K-12 years?
• The confidence interval does not contains the value pp11--pp2 2 = 0= 0.. Therefore, it is not likely that pp11= = pp2. 2. You would conclude that there is a difference in the proportions for males and females.
45.0 0.07 21 pp 45.0 0.07 21 pp
A higher proportion of males than females played soccer in their youth.
Example (cont’d)
Confidence intervals are by their nature two-sided two-sided since they produce upper and lower bounds for the parameter.
One-sided bounds One-sided bounds can be constructed simply by using a value of z that puts a rather than /2 in the tail of the z distribution.
Estimator) ofError Std(Estimator :UCB
Estimator) ofError Std(Estimator :LCB
z
zEstimator) ofError Std(Estimator :UCB
Estimator) ofError Std(Estimator :LCB
z
z
The total amount of relevant information in a sample is controlled by two factors: The sampling plansampling plan or experimental designexperimental design:
the procedure for collecting the information The sample size sample size nn: the amount of information
you collect. In a statistical estimation problem, the
accuracy of the estimation is measured by the margin of errormargin of error or the width of the width of the confidence interval.confidence interval.
1. Determine the size of the margin of error, E, that you are willing to tolerate.
2. Choose the sample size by solving for n or n n 1
n2 in the inequality: 1.96 SE E, where SE is a
function of the sample size n.
3. For quantitative populations, estimate the population standard deviation using a previously calculated value of ss or the range approximation Range / 4.Range / 4.
4. For binomial populations, use the conservative approach and approximate p using the value pp .5 .5.
A producer of PVC pipe wants to survey wholesalers who buy his product in order to estimate the proportion who plan to increase their purchases next year. What sample size is required if he wants his estimate to be within 0.04 of the actual proportion with probability equal to 0.95?
04.96.1 n
pq04.0
)5.0(5.096.1
n
5.2404.0
)5.0(5.096.1 n 25.6005.24 2 n
He should survey at least 600 wholesalers.
4. Estimating the Variance
The sample variance is defined by
1
1
)(
1
2
12
1
2
2
nn
x
x
n
xxs
n
i
n
ii
i
n
ii
2221
2
2
1
2
22
22
2
11
2
1
1
sEn
xE
nxE
n
nn
xnExE
xxExxE
n
ii
n
ii
i
n
ii
n
ii
22
2
XE
XEXEXVar
Analysis of Sample Variance
2
21
22
22
2 11
snsn
If s2 is the variance of a random sample size n from a normal population, a 100(1-)% confidence interval for 2 is
22 2
21 Where and are values with (n-1) degrees of freedom.
2
Small sample estimation
Take sample of 15 patrons from our library sample Mean 41.64 Standard deviation 40.13 Number of cases 15
Find 95 percent confidence interval t value, from table, for 14 degrees of freedom,
2.145
Values of t
Interval Estimate of 1 - 2:Small-Sample Case (n1 < 30 and/or n2 < 30)
Interval Estimate with 2 Known
where:
21221 xxzxx
21
2 1121 nnxx
11 2
2
2221
2
121
2
2221
21
nnsnns
nsnsdf
Interval Estimate of 1 - 2:Small-Sample Case (n1 < 30 and/or n2 < 30) Interval Estimate with 2 Unknown
21;221 xxdf stxx
21
2 1121 nn
ss xx
2
11
21
222
21122
nn
snsnss p
22
21 2
221
2
22
1
21
21 n
s
n
ss xx
221 nndf
Example: Specific Motors
Specific Motors of Detroit has developed a newautomobile known as the M car. 12 M cars and 8 J cars(from Japan) were road tested to compare miles-per-gallon (mpg) performance. It is assumed that both
populations have equal variances. The sample statistics are:
Sample #1 Sample #2 M Cars J Cars
Sample Size n1 = 12 cars n2 = 8 cars
Mean = 29.8 mpg = 27.3 mpgStandard Deviation s1 = 2.56 mpg s2 = 1.81
mpg x2x2x1x1
Point Estimate of the Difference Between Two Population Means
1 = mean miles-per-gallon for the population of M cars2 = mean miles-per-gallon for the population of J cars
Point estimate of 1 - 2 = = 29.8 - 27.3 = 2.5 mpg.
x x1 2x x1 2
Example: Specific Motors
95% Confidence Interval Estimate of the Difference Between Two Population Means: Small-Sample CaseWe will make the following assumptions: The miles per gallon rating must be normally distributed for both the M car and the J car. The variance in the miles per gallon rating must be the same for both the M car and the J car. Using the t distribution with n1 + n2 - 2 = 18
degreesof freedom, the appropriate t value is t.025 = 2.101.
We will use a weighted average of the two samplevariances as the pooled estimator of 2.
Example: Specific Motors
95% Confidence Interval Estimate of the Difference Between Two Population Means: Small-Sample Case
= 2.5 + 2.2 or .3 to 4.7 miles per gallon.We are 95% confident that the difference between themean mpg ratings of the two car types is from .3 to 4.7 mpg (with the M car having the higher mpg).
sn s n s
n n2 1 1
22 2
2
1 2
2 21 12
11 2 56 7 1 8112 8 2
5 28
( ) ( ) ( . ) ( . )
.sn s n s
n n2 1 1
22 2
2
1 2
2 21 12
11 2 56 7 1 8112 8 2
5 28
( ) ( ) ( . ) ( . )
.
x x t sn n1 2 025
2
1 2
1 12 5 2 101 5 28
112
18
. ( ) . . . ( )x x t sn n1 2 025
2
1 2
1 12 5 2 101 5 28
112
18
. ( ) . . . ( )
Example: Specific Motors
Key ConceptsKey ConceptsI. Types of EstimatorsI. Types of Estimators
1. Point estimator: a single number is calculated to estimate the population parameter.2. Interval estimatorInterval estimator: two numbers are calculated to form an interval that contains the parameter.
II. Properties of Good EstimatorsII. Properties of Good Estimators1. Unbiased: the average value of the estimator equals the parameter to be estimated.2. Minimum variance: of all the unbiased estimators, the best estimator has a sampling distribution with the smallest standard error.3. The margin of error measures the maximum distance between the estimator and the true value of the parameter.
Key ConceptsKey ConceptsIII. Large-Sample Point EstimatorsIII. Large-Sample Point Estimators
To estimate one of four population parameters when the sample sizes are large, use the following point estimators with the appropriate margins of error.
Key ConceptsKey ConceptsIV. Large-Sample Interval EstimatorsIV. Large-Sample Interval Estimators
To estimate one of four population parameters when the sample sizes are large, use the following interval estimators.
Key ConceptsKey Concepts1. All values in the interval are possible values for the
unknown population parameter.2. Any values outside the interval are unlikely to be the
value of the unknown parameter.3. To compare two population means or proportions,
look for the value 0 in the confidence interval. If 0 is in the interval, it is possible that the two population means or proportions are equal, and you should not declare a difference. If 0 is not in the interval, it is unlikely that the two means or proportions are equal, and you can confidently declare a difference.
V. One-Sided Confidence BoundsV. One-Sided Confidence BoundsUse either the upper () or lower () two-sided bound, with the critical value of z changed from z / 2 to z.