Estimation
Estimation - 1
1
Estimation
Confidence Intervals for Means
2
Inferential Statistics
1. Type of Inference:
Estimation
Hypothesis Testing
2. Purpose
Make Decisions
about Population
Characteristics
Population?
3
Estimation Process
Mean, , is
unknown
Population (, s)
Random Sample
Mean
X= 98
Sample
X1, X2, … , Xn
~N(,s)
~N(,s)
X1
X2
X3 ~N(,s) X4 ~N(,s)
X5 ~N(,s) ...
4
Random Sample
Random sample is a set of independent and
identically distributed (i.i.d.) random variables.
5
Theorem (Distribution of )
If X1, X2, …, Xn are observations of a random
sample of size n from the normal distribution
N(, s2), then the distribution of the sample
mean is N(, s2/n)
n
i
iXn
X1
1
X
6
Theorem (Distribution of )
If X1, X2, …, Xn are observations of a random
sample of size n from a distribution that has a
mean and a finite variance s2, then the
distribution of is N(0, 1), as n ,
X
n
XZ
/s
s
n
nXZ
n
i i 1
Xand the distribution of the sample mean
is N(, s 2/n), as n .
Estimation
Estimation - 2
7
Statistics Used to Estimate Population Parameters
Sample Mean,
Sample Variance, s2
Sample Proportion,
…
Estimators
p̂
x population mean
s2 population variance
p population proportion
Parameters Statistics
8
Sampling Distribution
Theoretical Probability Distribution of
the Sample Statistic.
XThe distribution of the sample mean
from N(, s2), is N(, s2/n), as n .
Example:
9
Disadvantage of Point Estimation
1. Provides Single Value
Based on Observations from 1 Sample.
* Sample MeanX = 98 Is a Point Estimate of
Unknown Population Mean.
2. Gives No Information about How Close Value Is
to the Unknown Population Parameter
Which of the following statistics do you prefer? a. 32% b. 32% with a margin of error 3%
10
Estimation
You’re interested in finding the average body temperature of healthy adults in Northeastern Ohio (the population). What would you do?
How can we estimate this average with a measure of reliability?
98 1 F 98 .5 F 98 .2 F
11
Interval Estimation
Margin of Error Gives Information about How Close Value Is to the Unknown Population Parameter.
12
Sampling Error
x
Sample statistic
(point estimate)
Sampling Error = | – | x
Estimation
Estimation - 3
13
Key Elements of
Interval Estimation
Sample statistic
(point estimate)
Confidence
limit (lower)
Confidence
limit (upper)
Confidence
interval
Confidence Level: A probability that the
population parameter falls somewhere
within the interval.
x Margin of Error
98 1 F
14
Confidence Interval Estimation
X
Xs
s
s
1)( 2/2/n
zXn
zXP
ss 1)( 2/2/ xx zXzP
ss 1)( 2/2/ xx zXzXP
2/n
zxs
xz s 2/xz s 2/
/2 1 –
15
Confidence Interval Estimation
X
Xs
95.)96.196.1( n
Xn
XPs
s
95.)96.196.1( xx XP ss
95.)96.196.1( xx XXP ss
96.1n
xs
xs 96.1xs 96.1
.025 .95
16
The Confidence Interval
sx _
X
95% Sample
Means
+ 1.96sx - 1.96sx
1- = .95
Confidence Level
/2 /2 = .025
1.96 = z.025
x + 1.96sx x - 1.96sx
x
Confidence Interval =>
17
(1-)·100% Confidence Interval Estimate for
mean of a normal population
or
) , ( 2/2/n
zxn
zxss
2/n
zxs
Margin of Error
Confidence Interval for Mean
(s Known)
“s Known” may mean that we have very good estimate of s.
It is not practical to assume that we know s. 18
Confidence Interval of Mean
(s unKnown and n 30)
(1-)·100% Confidence Interval Estimate
for mean of a population when sample size
is relative large
or
) , ( 2/2/n
szx
n
szx
2/n
szx
Estimation
Estimation - 4
19
The Confidence Interval
95% Samples
sx _
X
+ 1.96sx - 1.96sx
x - 1.96sx x + 1.96sx
x
Confidence Interval =>
95% Confidence
Interval
20
95% Samples
sx _
X
2.5% 2.5%
95 % of
intervals
contain .
5% do not.
The Confidence Interval
21
Factors Affecting
Interval Width
1. Data Dispersion
Measured by s
2. Sample Size
Affects standard error:
3. Level of Confidence (1 )
Affects Z/ 2
n
x
ss
) , ( 2/2/n
zxn
zxss
22
90% Samples
95% Samples
99% Samples
+ 1.65sx + 2.58sx
sx _
X
+1.96sx
- 2.58sx - 1.65sx
-1.96sx
Size of Interval
23
Estimation Example
Mean (s Known) The average weight of a random sample of n = 25
subjects isX = 140. Set up a 95% confidence interval
estimate for if s = 10. (Assume Normal population.)
3.92140or ) 92.341 , 08.631 (
) 25
1096.1041 ,
25
1096.1041 (
) , (
1.96. z .025, 2
.05, ,95.1
2/2/
2
nZX
nZX
ss
2/n
zxs
143.92) (136.08,
92.3 140 25
1096.1401
24
Interpretation
We can be 95% confident that the population
mean is in (136.08, 143.92).
We can be 95% confident that the maximum
sampling error using this interval estimate for
estimating mean is within 3.92.
Estimation
Estimation - 5
25
Confidence Interval of Mean
(s unKnown and n 30)
(1-)·100% Confidence Interval Estimate
for mean of a population when sample size
is relative large
or
) , ( 2/2/n
szx
n
szx
2/n
szx
26
Thinking Challenge
Example: A city uses a certain noise index to monitor the noise pollution at a certain area of the city. A random sample of 100 observations from randomly selected days around noon showed an average index value of x = 1.99 and standard deviation s = 0.05. Find the 90% confidence interval estimate of the average noise index at noon.
27
Confidence Interval Solution*
) 998.1 , 982.1 (
0.008 1.99100
05.64.199.1
1.64z z
.05 /2 .1, 90.1 .90, 1
2/
.052 /
n
szx
28
Interval Estimation for Mean
In a survey on a random sample of 64
individuals who gambled at Las Vegas, the
average amount of money won for the day that
survey was done is –$25.50 with a standard
deviation of $100. Find the 95% confidence
interval estimate for the average amount of
money won by people gambled at Las Vegas
that day.
29
Finding Sample Sizes
for Estimating
I don’t want to
sample too much
or too little!
2
22
2
2
2
Error ofMargin
nz :C.I.
E
zn
nzE
x
s
s
s
B = Margin of Error or Bound
30
Sample Size Example
What sample size is needed to be 90%
confident of being correct within 5? A pilot
study suggested that the standard deviation is
45.
2202.2195
45645.12
22
2
22
05. E
zn
s
Estimation
Estimation - 6
31
Thinking Challenge
You plan to survey residents in
your county to find the average
health insurance premium that they
are paying. You want to be 95%
confident that the sample mean is
within ± $50.
A pilot study showed that s was
about $400. What sample size
should you use?
32
Sample Size Solution*
24686.245
50
40096.12
22
2
22
025.0
E
zn
s
33
Confidence Interval Mean
(s Unknown & n < 30)
1. Assumptions
Population Standard Deviation Is Unknown
Population Must Be Normally Distributed
2. Use Student’s t Distribution
3. Confidence Interval Estimate
) , ( 1,2/1,2/n
stx
n
stx nn
n
stx
n
1 ,2
34
t
Student’s t Distribution
0
t (df = 5)
Z
Standard
Normal (Z)
Bell-Shaped
Symmetric
‘Fatter’ Tails
t (df = 13)
ns
xt
35
Theorem (Distribution of and S 2) X
)1( is )1( 2
2
2
nSn
s
X
If X1, X2, …, Xn are observations of a random
sample of size n from the normal distribution
N(, s 2). The statistics, sample mean, , and
sample variance, S 2, are independent and
n
i iXn
X1
1
n
i i XXn
S1
22 )(1
136
Student’s t Distribution
Let Z be a random variable that is N(0, 1),
and U be a random variable that is 2(r), and
Z and U are independent. Then, the random
variable
has a t-distribution with degrees of freedom r.
rU
ZT
Estimation
Estimation - 7
37
Student’s t Distribution
If X1, X2, …, Xn are observations of a random
sample of size n from the normal distribution
N(, s 2). The statistics, sample mean, , and
sample variance, s 2, are independent and
has a t-distribution with d.f. n – 1.
nS
XT
X
38
t-statistic
nS
X
nSn
n
X
T
s
s
)1()1(
2
2
Z
U d.f. of U
39 40
Student’s t Table
t values
or percentile in t-distribution t0
.05
For a 90% C.I.:
n = 3
df = n - 1 = 2
= .10
/2 =.05
t/2 = ?
2.920
41
Estimation Example
Mean (s Unknown) A random sample of weights of 25 subjects, has a sample
mean 140 and sample standard deviation 8. Set up a
95% confidence interval estimate for .
) 31.341 , 69.631 (
3.31 140 25
8064.2041
064.2
.025, /2 .05,.951 .95, 1
025.024 , /2
tt df
1,2/n
stx n
42
Thinking Challenge
The numbers of community hospital beds per 1000
population that are available in each different
regions of the country is normally distributed. A
random sample 6 regions were selected and the
rates of beds per 1000 were recorded and they are
3.6, 4.2, 4.0, 3.5, 3.8, 3.1.
Find the 90% confidence interval estimate of the
mean bed-rate in the country.
Estimation
Estimation - 8
43
Confidence Interval Solution*
= 3.7
s = 0.38987
x
1592.6
38987.
n
s
(use 90% confidence level)
n = 6, df = n 1 = 6 1 = 5
t.05,5 = 2.015
( 3.7 - (2.015)(0.1592), 3.7 + (2.015)(0.1592) )
( 3.379, 4.021 )
n
stx n 1 ,2/
44
Confidence interval with z-score:
The (1 %confidence interval estimate
for population mean:
Assumption: If sampled from normal
population with known variance, s,
Assumption: If large sample and if
unknown variance, s replaces s,
nzx
s 2/
n
szx 2/
45
Confidence interval with t-score:
The (1 %confidence interval estimate
for population mean:
Assumption: If sampled from normal
population with unknown variance, s,
n
stx ndf 1 ,2/
(If sample size is large the normality assumption is
insignificant.) t z as sample becomes large
46
Average Weight for Female Ten
Year Children In US
Info. from a random sample: n = 10, x = 80 lb, s =
18.05 lb, assume weight is normally distributed,
find the 95% confidence interval estimate for
average weight.
Data: 73.80 50.00 101.40 67.20 102.20 97.80 81.00 93.40 63.20 70.00
How do we know whether normality assumption is OK?
47
T ests of Normality
.171 10 .200* .930 10 .452weight (pounds) of participant
Statistic df Sig. Statistic df Sig.
Kolmogorov-Smirnova Shapiro-Wilk
This is a lower bound of the true significance.*.
Lilliefors Significance Correctiona.
Both are greater than 0.05, normality assumption is acceptable.
48
Average Weight for Female Ten
Year Children In US
Info. from a random sample: n = 10, x = 80 lb, s =
18.05 lb, assume weight is normally distributed, find
the 95% confidence interval estimate for average
weight.
t/2 = t.05/2 = t0.025,d.f. = 10 – 1 = 9, t0.025,9=2.262
10
05.18262.2809,2/
n
stx df
)91.92 ,09.67( 91.1280
Estimation
Estimation - 9
49
Descriptiv es
80.0000 5.70840
67.0867
92.9133
80.4333
77.4000
325 .858
18.05153
50.00
102 .20
52.20
32.5000
-.148 .687
-1.229 1.334
86.8600 3.96048
77.9008
95.8192
86.5222
82.8000
156 .854
12.52413
73.80
106 .00
32.20
25.5500
.553 .687
-1.422 1.334
Mean
Lower Bound
Upper Bound
95% Confidence
Inte rval for Mean
5% Trim med Mean
Median
Variance
Std. Deviation
Min imum
Maxim um
Range
Inte rquart i le Range
Skewness
Kurtosis
Mean
Lower Bound
Upper Bound
95% Confidence
Inte rval for Mean
5% Trim med Mean
Median
Variance
Std. Deviation
Min imum
Maxim um
Range
Inte rquart i le Range
Skewness
Kurtosis
What is your sex?
fem ale
ma le
weight (pounds)
of participant
Statistic Std. Error
80 12.91 Weight for Ten Year Old
50
One-Sided C. I.
Z C.I.:
Lower interval
Upper interval
t C.I.:
Lower interval
Upper interval
) , (n
zxs
) , ( n
zxs
) , (n
stx
) , ( n
stx
51
Size of Interval
95% Samples
sx _
x
+1.96sx -1.96sx
0.025 0.025
.95
52
Lower Interval
95% Samples
sx _
X
+1.64sx
0.05
.95
z = z.05
53
Estimation Example
Mean (s Known)
The mean of a random sample of weight of 25
subjects isx = 140. Set up a upper 95%
confidence interval estimate for if s = 10.
) , 72.361 (
) , 25
1064.1041 (
) , (
1.64. z .05, ,95.1
nzx
s
54
Estimation
Confidence Intervals for Proportions
Estimation
Estimation - 10
55
Proportion Estimation
Parameter: Population Proportion p (or p)
(Percentage of people has no health insurance)
Statistic: Sample Proportion n
xp ˆ
x is number of successes
n is sample size
Remark: If data is coded as 1 or 0, sample mean is the same as sample proportion of 1’s.
Data: 1, 0, 0, 1, 0 px
4.5
2
5
01001
56
Confidence Interval
Proportion
1. Assumptions
Two Categorical Outcomes
Normal Approximation Can Be Used If
np and n(1 – p) are both greater than 5.
) )ˆ1(ˆ
ˆ , )ˆ1(ˆ
ˆ ( 22n
ppzp
n
ppzp
2. Confidence Interval Estimate
(for large sample)
n
ppp
)ˆ1(ˆzˆ
2
57
Parameters of Sample Proportion
X ~ Binomial (n,p),
E[X] = np, Var[X] = np(1p)
n
Xp ˆ
~ ?
E[X/n] = ? , Var[X/n] = ?
n
pp )1( p
58
Estimation Example
Proportion
A random sample of 400 from a large
community showed that 32 have diabetes. Set up
a 95% confidence interval estimate for p, the
percentage of people that have diabetes.
96.1,40008400
32ˆ 025.2/ zzn.p ,
n
ppp
)ˆ1(ˆzˆ
2
59
Estimation Example
Proportion
The 95% C.I. for p, the percentage of people that
have diabetes:
) 107. , 053. ( %7.2%8 .027 .08
400
)08.1(08.96.108.
)ˆ1(ˆ
ˆ 2/n
ppZp
400 ,08400
32ˆ n.p
60
Thinking Challenge
A member of a health department wish to see what percentage of people in a community will support an environmental policy. Of 200 survey forms sent and received, 35 responded that they support the policy and the rest of them do not support the policy.
Find a 90% confidence interval estimate of the percentage of the population in this community that support the policy?
Estimation
Estimation - 11
61
Confidence Interval
Solution*
) %92.21 , %08.13 (
4.42%17.5%0442. .175
645.1 ,200 175.200
35ˆ
2/ zn,p
)ˆ1(ˆ
ˆ 2/n
ppzp
200
)825(.175.645.1175.
62
Example:
Researchers wish to estimate the percentage of
hospital employees infected by SARS in a
certain country. Out of 500 randomly chosen
hospital employees, 14 were infected. Find the
95% confidence interval estimate for
percentage of hospital employees infected by
SARS in this country.
63
Sample Size
25.0
or
2
2
2 E
zn
to get the largest sample to
achieve the goal.
n
ppp
)ˆ1(ˆzˆ :C.I.
2
n
ppZE
)ˆ1(ˆError ofMargin
2
if pilot study is done.
)ˆ1(ˆ2
2
2 ppE
zn
64
Sample Size (No prior information on p)
Sample Size Example: If one wishes to do a
survey to estimate the population proportion
with 95% confidence and a margin of error of
3%, how large a sample is needed?
z/2 = 1.96; E = .03
n = (1.962/.032) x .25 = 1067.11
A sample of size 1068 is needed.
65
Sample Size (With prior information on p)
Sample Size Example: If one wishes to to estimate
the percentage of people infected with West Nile in a
population with 95% confidence and a margin of
error of 3%, how large a sample is needed? (A pilot
study has been done, and the sample proportion was
6%.)
z/2 = 1.96; E = .03
n = (1.962/.032) x .06 x (1 – .06) = 240.7
A sample of size 241 is needed.
How large a sample was used for pilot study? 66
nz
nznppznzp
/1
)4/(/)ˆ1(ˆ)2/(ˆ2
2/
22
2/2/
2
2/
nppzp /)ˆ1(ˆˆ2/
Wilson’s method (Unbiased Estimation)
Asymptotic Method
For x = 16, n = 100, 95% CI is (0.121, 0.287)
For x = 16, n = 100, 95% CI is (0.101, 0.244)