Night 1
INFERENTIAL STATISTICS: USING THE SAMPLE
STATISTICS TO INFER (TO) POPULATION PARAMETERS.
Modular Course 5 Summary or Descriptive Statistics:
Numerical and graphical summaries of data.
Making Decisions based on the Empirical Rule (Standard Normal
Curve)
0 1 2 3123
68%
95%
99.7%
1 2 3 1 2 3
Empirical Rule
68%
95%
99.7%
2 2
Most Important for Inferential Stats on our Syllabus
95%
95% of normal data lies within 2 standard deviations of the mean
95% of the IQ scores are within 2 standard deviations of the mean. 100 2(15) 100 30 130 100 2(15) 100 30 70
Solution
0 1 2 3123
68%
95%
99.7%
IQ scores are normally distributed with a mean of 100 and a standard deviation of 15.Use the Empirical Rule to show that 95% of IQ scores in the population are between 70 and 130.
Example 1
The number of sandwiches sold by a shop from 12 noon to 2 pm each day is normally distributed.The mean of the distribution was 42.6 sandwiches and a standard deviation of 8.2.Use the Empirical Rule to identify the range of values around the mean that includes 68% of the sale numbers.
68% of the sales are within 1 standard deviations of the mean . 42.6 1(8.2) 42.6 8.2 50.8 42.6 1(8.2) 42.6 8.2 34.4
Solution
Solution: 68% of the sale are between 34.4 and 50.8 sandwiches.
0 1 2 3123
68%
95%
99.7%
Example 2
Your Turn
Race - Week B&B prices per room (€)
56 75 60 70 80 70 50 90 80 75
75 50 75 50 70 60 65 60 50 70
84 70 70 60 60 70 70 70 40 60
70 80 60 65 55 50 70 80 50 55
(i) Calculate, correct to one decimal place, the mean and standard deviation of the data.(ii) Show that the emperical rule holds true for 1 standard deviation around the mean.(iii) Show that the emperical rule holds true for 2 standard deviations around the mean.
The table below shows the prices charged per room of 40 B&B houses in Galw ay.
0 1 2 3123
68%
95%
99.7%
Question
(i) Using calculator : Mean = 65.5, SD =11.2
(ii) Upper Range = Mean + 1(Standard Deviation) = 76.7 Lower Range = Mean - 1(Standard Deviation) = 54.4 Of the forty houses 13(68.05%) charge between €54.40 and €76.70Therefore aprox 68% of the prices lie between 1 standard deviation of the mean. (iii) Upper Range = Mean + 2(Standard Deviation) = 87.9 Lower Range = Mean - 2(Standard Deviation) = 43.1 Of the forty houses 38 (95%) charge between €43.10 and €87.90Therefore aprox 95% of the prices lie between 2 standard deviations of the mean.
0 1 2 3123
68%
95%
99.7%
Solution
For Leaving Cert we deal with two types of sampling:
1. Sample Proportion (Ordinary Level and Higher Level)
2. Sample Means ( Higher Level)
Sampling
Inferential Statistics:
We are usually unable to collect information about a total population.The aim of sampling is to draw reasonable conclusions about a population by obtaining information from a relatively small sample of that population.
When a sample from a population is selected we hope that the data we get represents the population as a whole.
To ensure this1. The sample must be random;
2. Every member of the population must have an equal chance of being included;
Sample 5
Sample 4
Sample 3
Sample 6
Sample 2Sample 1
Population
Sampling
A sample of 25 students in a school were asked if they spent over €5 on mobile phone calls over the last week. 10 students have spent over €5. The proportion of the sample of 25 who spent over €5 was Can we say that 40% of the students in the school (population) spent over €5?
The answer is no, (unless the sample size was the same as the population size), we can’t say for certain.
However we could say with a certain degree of confidence, if the sample was large enough and
representative then the proportion of the sample would be approximately the same as the proportion of the
population
Population Proportions and Margin of Error
How confident we are is usually expressed as a percentage.We already saw (from the empirical rule) that approximately 95% of the area of a normal curve lies within ± 2 standard deviations of the mean.
This means that we are 95% certain that the population proportion is within ±2 standard deviations of the sample proportion. ± 2 standard deviations is our margin of error and the percentage margin of error that this represents depends on the sample size.
If n = 1000 the percentage margin of error of ± 3%
95% is the confidence interval we are working with, but other confidence intervals also exist (e.g.90% and 99%) for which a different margin of error applies depending on sample size.
At 95% level of confidence1
Margin of Errorn
where n, is the sample size
Population Proportions and Margin of Error
PopulationProportion
�̂�−𝟏√𝒏
�̂�+𝟏√𝒏
95% confident that the population proportion is inside this confidence interval
95% confidence interval
Confidence interval for population proportion using Margin of Error
20 d
iffere
nt
95%
con
fid
en
ce in
terv
als
−𝟏√𝒏
+𝟏√𝒏
Question. A sample of 25 students in a school were asked if they spent over €5 on mobile phone calls over the last week. 10 students has spent over €5.
Showing a 95% confidence interval.
The proportion of the sample of 25 who spent over €5 was Margin of Error = = 0.2
95% of the time, the true population proportion is in the interval I made with my sampled proportion and the margin of error interval.
• As the sample size increases the margin of error decreases • A sample of about 50 has a margin of error of about 14% at 95% level
of confidence
• A sample of about 1000 has a margin of error of about 3% at 95% level of confidence
• The size of the population does not matter
• If we double the sample size (1000 to 2000) we do not get do not half the margin of error
• Margin of error estimates how accurately the results of a poll reflect the “true” feelings of the population
114.14%
50
Some Notes on Margin of Error
13.16%
1000
Sample Size Margin of Error
25 20%
64 12.5%
100 10%
256 6.25%
400 5%
625 4%
1111 3%
1600 2.5%
2500 2%
10000 1%
𝟏√𝒏
A company claims that 30% of people who eat their "Rice Crispy Bun" product really liked it. The confidence level is cited as 95%. In June an independant survey was carried out on 625 randomly selected people to see if they liked the "Rice Crispy Bun" product. Calculate the margin of error.
The result of the survey in June was that 125 liked the "Rice Crispy Bun" product. Accord
(i)(ii)
ing to the June survey would you say that at a 5% level of significance the company was correct in stating that 30% of people who eat their "Rice Crispy Bun" product really l
iked it?
1 1Margin of Error 0.04 4%
n 625The company claim 30% like the product.
The margin of error is plus or minus 0.04.
Solution
(i)
(ii) Reason :
Acording to the survey 125 out of 625 liked the product. 30% is outside the margin of error. 30
%
𝒑16% 24%�̂�−𝟏√𝒏
�̂�−𝟏√𝒏𝒑
Example 1
2
In a survey I want a margin of error of or 5% at 95% level of confidence.What sample size must I pick in order to achieve this?
Margin of Error 0.051
0.05n
1( 0.05)
n1
n0.0025
n 400
Solution
Example 2
Your Turn
A sweet company claims that 10% of the M&M's it produces are green.Students found that in a large sample of 500 M&M's 60 were green.
Calculate the margin of error. State weather 60 greens from 5
(i)(ii) 00 is an unusually high proportion of green M&M's if the claim by the company is assumed to b
e true.
Question
A sweet company claims that 10% of the M&M's it produces are green.Students found that in a large sample of 500 M&M's 60 were green.
Calculate the margin of error. State weather 60 greens from 5
(i)(ii) 00 is an unusually high proportion of green M&M's if the claim by the company is assumed to b
e true.
10%
𝒑7.5% 16.5%
60p 0.12 12%
500
10% is between 7.5% and 16.5% (inside the margin of error) so it seems not to be unusual.
�̂�−𝟏√𝒏
�̂�−𝟏√𝒏𝒑
1 1
Margin of Error 0.045 4.5%n 500
Solution
(i)
(ii) Reason :
Question: Solution
Testing claims about a population.
Null Hypothesis: The null hypothesis, denoted by H0 is a claim or statement about a population. We assume this statement is true until proven otherwise. (the null hypothesis means that nothing is wrong with the claim or statement).
Alternative Hypothesis: The alternative hypothesis, denoted by H1
is a claim or statement which opposes the original statement about a population.
Recognising the Concept of a Hypothesis Test
Courtroom Analogy to Teach Formal Language
• At the start of a trial it is assumed the defendant is not guilty.
• Then the evidence is presented to the judge and jury.
• The null hypothesis is that the defendant is not guilty (H0)
• If the jury reject the null hypothesis (H0), this means that they find the defendant guilty.
• If the jury fail to reject the null hypothesis (H0), this means that they find the defendant not guilty.
Often we need to make a decision about a population based on a sample.
1. Is a coin which is tossed biased if we get a run of 8 heads in 10 tosses? Assuming that the coin is not biased is called a NULL HYPOTHESIS
(H0) Assuming that the coin is biased is called an ALTERNATIVE
HYPOTHESIS (H1)
2. During a 5 minute period a new machine produces fewer faulty parts than an old machine.
Assuming that the new machine is no better than the old one is called a NULL HYPOTHESIS (H0)
Assuming that the new machine is better than the old one is called an ALTERNATIVE HYPOTHESIS (H1)
3. Does a new drug for Hay-Fever work effectively? Assuming that the new drug does not work effectively is called a NULL
HYPOTHESIS (H0) Assuming that the new drug does work effectively called an
ALTERNATIVE HYPOTHESIS (H1)
Population
Proportion
�̂�−𝟏√𝒏
�̂�+𝟏√𝒏
Claim %(H0) is inside
Claim %(H0) is inside
Claim %(H0) is outside
Claim %(H0) is outside
95% confidence interval
Reject RejectFail to Reject
Fail to Reject
Hypothesis test on a population proportion using Margin of Error
Go Fast Airlines provides internal flights in Ireland, short haul flights to Europe and long haul flights to America and Asia. Each month the company carries out a survey among 1000 passengers. The company repeatedly advertises that 70% of their customers are satisfied with their overall service. 664 of the sample stated they were satisfied with the overall service.
Go Fast Airlines
Example 1
Example 1
Go Fast Airlines provides internal flights in Ireland, short haul flights to Europe and long haul flights to America and Asia. Each month the company carries out a survey among 1000 passengers. The company repeatedly advertises that 70% of their customers are satisfied with their overall service. 664 of the sample stated they were satisfied with the overall service. Would you say that the company were correct in saying that 70% of their customers were satisfied?
State the null hypotheses and state your conclusions clearly. Null Hypothesis: The proportion of passengers who are satisfied with the service is unchanged 70%. p = 0.7Alternative Hypothesis: The proportion of passengers who are satisfied with the service is not 70%. p 0.7
Evidence:Sample Proportion = Margin of Error =
ConclusionThe 70% is outside the range 63.24% to 69.56% of our confidence interval.There is sufficient evidence to reject the claim that the percentage of passengers who are happy with the service is 70% at the 5% level of significance.
Possible Actions: Change the advertisement from 70% to 65%.Meet with staff to come up with suggestions about how to improve the level of satisfaction.Do a further survey to find out more detail about why the level of satisfaction has changed.
𝒑70%
63.24% 69.56%Reject
�̂�−𝟏√𝒏
�̂�−𝟏√𝒏𝒑
Your Turn
It is generally agree that 40% of the voting public are in favour of a change of government. A survey was carried out on 900 randomly selected people to see if there was a change in support for the government. The result was that 42% are now in favour of a change of government.
Calculate the margin of error.State the null and alternative hypothesis. At
(i)(ii) (iii)
0
a 5% level of significance, would you accept or reject the null hypothesis? Give a reason for your conclusion.
1 1Margin of Error 0.03 3%
n 900Null hypothesis, H : "There is no change
Solution
(i)
(ii)
0
0
in the support for the government" Alternative hypothesis, H : "There is a change in the support for the government"
We Fail to Reject (Accept) H the null hypothesis.S
(iii) Reason : ee the diagram below. 40% is inside the margin of error.
Question 1
It is generally agree that 40% of the voting public are in favour of a change of government. A survey was carried out on 900 randomly selected people to see if there was a change in support for the government. The result was that 42% are now in favour of a change of government.
Calculate the margin of error.State the null and alternative hypothesis. At
(i)(ii) (iii)
0
a 5% level of significance, would you accept or reject the null hypothesis? Give a reason for your conclusion.
1 1Margin of Error 0.03 3%
n 900Null hypothesis, H : "There is no change
Solution
(i)
(ii)
0
0
in the support for the government" Alternative hypothesis, H : "There is a change in the support for the government"
We Fail to Reject (Accept) H the null hypothesis.S
(iii) Reason : ee the diagram below. 40% is inside the margin of error.
�̂�−𝟏√𝒏
�̂�−𝟏√𝒏𝒑
Question 1: Solution
40%
𝒑39% 45%Fail to Reject
RTÉ claim that 60% of all viewers watch the Late Late Show every Friday night. An independent survey was carried out on 400 randomly selected viewers to see if the claim were true. The result of the survey was that 180 were watching the Late Late Show.I. Calculate the margin of error.II. State the Null and Alternative Hypothesis.III. Would you accept or reject the Null Hypothesis according to this
survey? Give a reason for your conclusion.
Question 2
I. Margin of Error = = 0.05 = 5%
II. Null hypothesis : 60% of viewers watch the Late Late Show. Alternative hypothesis : 60% of viewers do not watch the Late Late Show. = 0.45 = 45%
iii. There is sufficient evidence, according to the survey, Reject the Null Hypotheses. Reason: 60% is outside the confidence interval.
�̂�−𝟏√𝒏
�̂�−𝟏√𝒏𝒑
Question 2: Solution
60%
𝒑40% 50%Rejec
t
1 2 3 1 2 3
Empirical Rule
68%
95%
99.7%
What about 1·5 std devs or 0·8 std devs?
Night 2
Different sets of data have different means and standard deviations but any that are normally distributed have the same bell-shaped normal distribution type of curves.
Normal Distribution Curve Standard Normal Curve
In order to avoid unnecessary calculations and graphing the scale of a Normal Distribution curve is converted to a standard scale called the z score or standard unit scale.
Normal Distribution to Standard Normal Distribution
4 7 10
13
16
19
22
242
254
266
278
290
302
314
–3 –2 –1 0 1 2 3
Normal Distributions
Standard Normal Distribution
133
27812
01
21z
21If 0 and 1 we would plot e
2This graph gives the Standard Normal Graph with a standardised scale.
0
1 2
2
1
22
33
z scores
Standard Normal Distribution
The area between the Standard Normal Curve and the z axis between and is 1.
21z
2
Total area under the curve
1P( z ) e dz
2
1
33
z – scores define the position of a score in relation to the mean using the standard deviation as a unit of measurement.
z – scores are very useful for comparing data points in different distributions.
The z – score is the number of standard deviations by which the score departs from the mean.This standardises the distribution.
xz
x is a data point is the population mean
is the standard deviation of the population
Standard Units (z – scores)
1z t2
For a given z, the table gives
1P(Z z) e dt
2
P(Z 1 31) can be read from the tables directly
P(Z 1 31) 0 9049 90.49%
Using the tables find P(Z 1 31).
Reading z – values From TablesExample 1
–3 –2 –1 0 1 2 31.31
Pg
. 3
6P
g.
37
P(Z 1 32) 1 PP(
(Z 1 32)P(Z 1 32) 1 0 9066 0
Z z) is equal to 1 P(Z
0934 9.
)
%
z
34
Using the tables find P(Z 1 32)
1.32–3 –2 –1 0 1 2 3
P(Z z) P(Z z)1 P(Z z)
The table only gives value to the left of z, butthe fact that the total area under the curve equals 1, allows us to use, P(Z z) 1 P(Z z)
z0
Example 2
P(Z z)
Pg
. 3
6P
g.
37
Using the tables find P(Z 0 74).
The tables only work for positive values but as the curve is symmetrical about z 0P(Z 0 74) P(Z 0 74)P(Z 0 74) 1 P(Z 0 74)P(Z 0 74) 1 0 7704 = 0 2296 22.96%
–3 –2 –1 0 1 2 3
–0.74
0–z z
P(Z z)
Both areas are the same and hence both probabilities are equal as the curve
is symmetrical about the mean, 0.
Example 3
P(Z z)
Pg
. 3
6P
g.
37
Using the tables find P( 1 32 z 1 29)
P( 1 32 z 1 29) Area to the Left of 1 29 Area to the left of 1.32
P(z 1 29) 1 P(z 1 32)
0 9015 [1 0 9066] = 0 8081 80.81%
1.29–1.32
–3 –2 –1 0 1 2 3
3 –3 –2 –1 0 1 2 3–1.32
–3 –2 –1 0 1 21.29
Example 4
Pg
. 3
6P
g.
37
Your Turn
The amounts due on a mobile phone bill in Ireland are normally distributed with a mean of €53 and a standard deviation of €15. If a monthly phone bill is chosen at random, find the probability that th
1 2
1 2
1 2
e amount due is between €47 and €74.
x xz z
47 53 74 53z z
15 15z 0 4 z 1 4 P( 0 4 Z 1 4)
P( 0 4 Z 1 4) P(Z 1 4) 1 P(Z 0 4)
P( 0 4 Z 1 4) 0 9192 [1 0 65
Solution
54]
P( 0 4 Z 1 4) 0 5746
Question 1
The amounts due on a mobile phone bill in Ireland are normally distributed with a mean of €53 and a standard deviation of €15. If a monthly phone bill is chosen at random, find the probability that th
1 2
1 2
1 2
e amount due is between €47 and €74.
x xz z
47 53 74 53z z
15 15z 0 4 z 1 4 P( 0 4 Z 1 4)
P( 0 4 Z 1 4) P(Z 1 4) 1 P(Z 0 4)
P( 0 4 Z 1 4) 0 9192 [1 0 65
Solution
54]
P( 0 4 Z 1 4) 0 5746
8–3
23–2
38–1
530
681
832
983
741.4
47–0.4
Question 1: Solution
The mean percentage achieved by a student in a statistic exam is 60%. The standard deviation of the exam marks is 10%.
What is the probability that a randomly selected student scores above 80%?
(i)(ii) What is the probability that a randomly selected student scores below 45%?
What is the probability that a randomly selected student scores between 50% and 75%? Suppose you were sitting this
(iii)(iv) exam and you are offered a prize for getting a mark which is
greater than 90% of all the other students sitting the exam?What percentage would you need to get in the exam to win the prize?
Solution
(i)
x 80 60z 2
10P(Z 2) 1 P(Z 2)P(Z 2) 1 0.9772 0.0228 2.28%
x 45 60z 1.5
10P(Z 1.5) P(Z 1.5) 1 P(Z 1.5)
P(Z 1.5) 1 0.9332 0.0668 6.68%
(ii)
30–3
40–2
50–1
600
701
802
903
30–3
40–2
50–1
600
701
802
903
45–1.5
Question 2
The mean percentage achieved by a student in a statistic exam is 60%. The standard deviation of the exam marks is 10%.
What is the probability that a randomly selected student scores above 80%?
(i)(ii) What is the probability that a randomly selected student scores below 45%?
What is the probability that a randomly selected student scores between 50% and 75%? Suppose you were sitting this
(iii)(iv) exam and you are offered a prize for getting a mark which is
greater than 90% of all the other students sitting the exam?What percentage would you need to get in the exam to win the prize?
Solution
(i)
x 80 60z 2
10P(Z 2) 1 P(Z 2)P(Z 2) 1 0.9772 0.0228 2.28%
x 45 60z 1.5
10P(Z 1.5) P(Z 1.5) 1 P(Z 1.5)
P(Z 1.5) 1 0.9332 0.0668 6.68%
(ii)
30–3
40–2
50–1
600
701
802
903
30–3
40–2
50–1
600
701
802
903
45–1.5
Question 2: Solution
1 2
1 2
1 2
x xz z
50 60 75 60z z
10 10z 1 z 1.5
P( 1 Z 1 5) P(Z 1 5) 1 P(Z 1)
P( 1 Z 1 5) 0.9332 [1 0.8413]P( 1 Z 1 5) 0.7745
From the tables an answer for an area
(iii)
(iv) of 90% (0.9) 1.28 Z 1.28x
z
x 601.28 x 72.8 marks
10
30–3
40–2
50–1
600
701
802
903
751.5
30–3
40–2
50–1
600
701
802
903
72.81.28
Question 2: Solution
0 1.96 1.96
95%2.5%
2.5%
Pg
. 3
7P
g.
36
For Higher Level Leaving Cert use z scores
𝑳𝑪𝑯𝑳 :𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆𝑳𝒊𝒎𝒊𝒕𝒔=±𝟏 .𝟗𝟔√ �̂� (𝟏−�̂� )𝒏95% confidence interval
Population
Proportion �̂�−𝟏 .𝟗𝟔√ �̂� (𝟏− �̂�)𝒏
�̂�+𝟏 .𝟗𝟔√ �̂� (𝟏− �̂�)𝒏
95% confident that the population proportion is inside this confidence interval
Confidence interval for population proportion
Example 1
Sample Proportion =
Confidence Limits =
Skygo provides Wifi in the Galway area . In March the company carries out a survey among 625 of its costumers. The company advertises that 60% of their customers were satisfied with their download speeds. 370 of the sample stated they were satisfied with their download speed time. Create a 95% confidence interval based on your sample.
55.36% 63.04% 𝒑95% confidence interval
�̂�−𝐸𝑟𝑟𝑜𝑟 �̂�+𝐸𝑟𝑟𝑜𝑟𝒑
Your Turn
The Sunday Independent reports that the government's approval rating is at 65%. The paper states that the poll is based on a random sample of 972 voters and that the margin of error is 3%Show that the pollsters used a 95% level of confidence.
Question 1:
The Sunday Independent reports that the government's approval rating is at 65%. The paper states that the poll is based on a random sample of 972 voters and that the margin of error is 3%Show that the pollsters used a 95% level of confidence.
Solution Confidence Limits=
0.03 =
0.03 =
=
=1.96
Therefore they are using a 95% level of confidence.
Question 1: Solution
It is known that 30% of a certain kind of apple seed will germinate. In an experiment 85 out of 300 seeds germinated. Construct a 95% confidence interval for the sample proportion.
Question 2
Sample Proportion =
Confidence Limits =
𝒑95% confidence interval
�̂�−𝐸𝑟𝑟𝑜𝑟 �̂�+𝐸𝑟𝑟𝑜𝑟𝒑
0.232<𝑝<0.334
Sample Means
Sample means
The data below are the heights in cm, of a population of 100, 15 year old students
170
174
174
164
152
155
160
172
156
163
182
167
158
154
167
140
143
167
178
165
176
165
166
177
148
147
166
184
165
162
185
171
168
173
175
160
167
172
179
153
180
172
164
178
153
152
167
165
174
145
155
150
150
158
162
166
163
159
148
170
154
181
155
165
180
168
158
158
175
176
166
165
170
175
175
158
160
177
166
180
165
165
166
168
180
157
153
150
179
157
161
152
161
144
174
172
165
157
174
159
xFrom the list above the Mean of the Population
n164 72
xFrom the list above the Standard Deviation of the Population
n
2
10 21
Slide60
Slide61
It does not matter if the original distribution of the sample means will always be normally distributed. Use Java Applets.
A single sample of 5 data points.
The black arrows are the data points. The mean of the sample is the red dot
A single sample of 10 data points.
Naturally if we choose a sample size of 100 (original population size) the mean of the sample will be that same as the mean of the population.
As the sample size increases the standard error will decrease. Why? ……………
x
x
For a sample size of 30
or m1.
The sample means are normally distributed
The sample means are normally distributed
x
x
For a sample size of 30
or m
or sn n
1.
2.
x
For a sample size of 30
or sn n
Sample 5
Sample 4
Sample 3
Sample 6
Sample 2Sample 1
Population
Population
Population
Large Sample
Sample Means
Mean
Standard Deviation (Standard Error)
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
KEY IDEA CLICK LINK BELOW
Summary
Population
Large Sample
Sample Means
Mean
Standard Deviation (Standard Error)
x
In practice, from the table above, we can say that for
1. The sample means are normally distributed.
2. The mean of the sample means is the same as the population mean.
3. The standard devi
n 30
at
x
ion of the sample means is equal to n
this is called the standard errorn
.
095- z1 z10025 0025
In the Standard Normal Distribution we want the values of z1 such that 95% of the population lies in the interval - z1 ≤ z ≤ z1
Therefore in a Normal Distribution 95% of the population lies within 1∙96standard deviations of the mean.95% of the population lies within 1∙96 of μ( the population mean)
Slide71
P(z z
z and z1
1 1
) 0 95 0 025 0 975
1 96 1 96
x x x x
x x
As the confidence limits are n n
1 96 1 96
1 96
(i) The confidence limits are x 1 96n
2 2 4 5 1 96
250 4 5 1 96(0 139) 4 5 1 96(0 139) 4 23 4 77 This means that we can s
ay with 95% confidence that the mean age of all cars in the population is between 4 23 years and 4 77 years.
2
(ii) 1 96 0 3n
2 2 1 96 0 3
n1 96 2 2
n 0 3
14 373
n 14 373 207cars
A random sample of 250 cars were taken and the mean age of the cars was 4 5 years and the standard diviation was 2 2 years.(i) Find the 95% confidence interval for the mean age of all cars.(ii) What
size sample is required to estimate the mean age, with 95% confidence
within 0.3 years.
Example 1
A random sample of 144 male students in a large university was taken and their heights measured.The mean height was 175 cm. The standard deviation of all the male students in the universitywas 9 cm.(i) Give a 95% confidence intreval for the heights of all the male students.(ii) Show that the confidence interval would decrease if a sample size was 225 instead of 144.
(i) n 144, x (mean of the sample) 175, (standard deviation of the population) = 9, (population mean) is unknown.
x
x
We calculate the standard error of the mean using n
9 0.75
144
As the sample size is large the best possible estimated value of is x which is 175 cm.Now we have to give a range of values in which the true population mean ( ) lies.
This will be with 95% level of certainty.
x 1.96 x 1.96 n n
175 1.96(0.75) 175 1.96(0.75) 173.53 176.47The true population mean lies within the range 173.53 cm to 176.47 cm with 95% certainty.
Example 2
x
(ii) If a sample of 225 were taken the standard error would be 9
0.6225
x 1.96 x 1.96 n n
175 1.96(0.6) 175 1.96(0.6) 173.82 176.18The true population mean lies within the range 173.82 cm to 176.18 cm with 95% certainty. The confidence interval has decreased.
This is narrower than the previous confidence interval. As you incerase the sample size you decrease the width of the confidence interval.
A study addressed the issue of whether pregnant women can correctly guess the sex of their baby. Among 104 recruited subjects, 57 correctly guessed the sex of the baby Use these sample data to test the claim that the success rate of such guesses is no different from the 50% success rate expected with random chance guesses. Use a 5% significance level.(based on data from “Are Women Carrying ‘Basketballs’ Really Having Boys? Testing Pregnancy Folklore,” by Perry, DiPietro, and Constigan, Birth, Vol. 26, No. 3)
Solution:The original claim is that the success rate is no different from 50%.
0
1
0.5
0.5
57ˆ 0.548104
ˆ 0.548 0.500.98
(1 ) (0.5)(0.5)/104
At 5% level of significance the critical values are 1.96
As 0.98 is between 1.96 and 1.96 we fail to reject the null hypthesis.
H
H
p
p pz
p p n
There is not sufficient evidence to warrant rejection of the claim that women who guess the sex of their babies have a success rate equal to 50%.
Your Turn
A survey was carried out to find the weekly rental costs of holiday apartments in a certain country.A random sample of 400 apartments was taken. The mean of the sample was €320 and the standard deviation was €50.Form a 95% confidence interval for the mean weekly rental costs of holiday apartments in that country.
The confidence limits are x 1 96n
50 320 1 96
400 320 1 96(2 5) 320 1 96(2 5) 315 1 324 9 Between €315 10 and €324 90
2005 LC. HL . Q 9 ( c )
Night 3
Hypothesis Testing
Slide79
Often we need to make a decision about a population based on a sample.
In a trial you are presumed innocent until after the trial? Assuming that an accused person is innocent ( nothing has
happened) is called a NULL HYPOTHESIS (H0) Assuming that an accused person is not innocent called an
ALTERNATIVE HYPOTHESIS (H1)
1. Is a coin which is tossed biased if we get a run of 8 heads in 10 tosses? Assuming that the coin is not biased is called a NULL HYPOTHESIS (H0) Assuming that the coin is biased is called an ALTERNATIVE HYPOTHESIS
(H1)
2. During a 5 minute period a new machine produces fewer faulty parts than an old machine.
Assuming that the new machine is no better than the old one is called a NULL HYPOTHESIS (H0)
Assuming that the new machine is better than the old one is called an ALTERNATIVE HYPOTHESIS (H1)
3. Does a new drug for Hay-Fever work effectively? Assuming that there is no difference between the new drug and the
currentdrug called a NULL HYPOTHESIS. ( H0 )
Assuming that the new drug is better than the current most popular drug is called an ALTERNATIVE HYPOTHESIS. ( H1 )
A Two Tailed Test.
The critical values for a 5% level of significance
z = 1∙96 or z = 1∙96
Slide81
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
Testing the Null Hypothesis using z-values
The statistical method used to determine whether H0 is true or not is called HYPOTHESIS TESTING.Statisticians speak of “not accepting or accepting H0 at a certain level”. This level is called the LEVEL OF SIGNIFICANCE. ( 5% level of significance is on the syllabus).
If the value of z lies outside the range 1∙96 < z < 1∙96 (critical region) we reject H0 .
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
Testing the Null Hypothesis using z-values
x
If we take a large sample of size n from a population with a mean of and a standard deviation of .
We have to calculate the mean of the sample x. ( x when we are dealing with large samples)
We ca
x x
0
n also calculate (s) by using . n
We want to test the hypothesis that the sample comes from a population with a paticular value of called
0
0
Null Hypotheses: Alternative Hypothesis: Note 1:
Ste
Not using >0 or <0. No
p 1. State the null and alter
direction stipulated.
native hypothe
There
ses.
fore this is a two tailed test. (Only Two Tailed Test on for Leaving Cert.)Note 2: Null Hypothesis always has an equal sign and uses population parameters
Testing hypotheses about a population mean (large samples ) .
XThe test statistic is a Standard Normal Z sco
Step 2. Convert the observed results into z units
re with Z = .
As we are dealing with the s
. Ca
ampl
lculate the test statist
ing distribution of t
ic .
he me
0
an
x Z = .
n(This is the difference between the value we have observed from our sample and the hypothesised value from the population divi
de by the standard error)
Observed Value Hypothesised Value Z =
Standard Error
0 0
Step 3. Write down the critical values. a sketch also helps .
Step 4. Reject H if Z is in the criticOnce we have the value
al regions,otherwise faiof Z we compare it to ou
l to reject r critica
H .l values and decide
wheather or not to reject the null hypotheses.
1. Write down the null hypothesis H0 and the alternative hypothesis H1
2. Convert the observed results into z units. (Calculate the test statistic).
3. Write down the critical values. (a sketch also helps).
4. Reject H0 if z is in the critical regions, otherwise fail to reject H0.
Review of the steps involved in Hypothesis Testing:
A company manafactures pens with a mean writing life of 500 hours and a standard deviation of 10 hours. A retailer examines a sample of 81 pens froma supplier who claims to only sell pens from this company and finds their meanlife is 497 hours. Are these pens genuine products from the company?
0
1
Null Hypotheses H : The sample of pens are genuine products from the compny. 500 Alternati
Step 1. State the null an
ve Hypothesis H : The s
d a
amp
lternative hyp
le of pens are
otheses.
not ge nuine products from the compny. 500Note: If not given a Level of Significance we must write it down. 5% (only level on for Leaving Cert.)
0
x 497 500
Step 2. Convert the observed results
into z units. Calculate the test statistic .
Z = = = 2.7 10
n 81
Example 1
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
Step 3. Write down the critical values. a sketch also helps .
− 2.7 is in the Reject Region
Example 1
0 0
We reject the null hypSt
otheses as 2.7 is in theep 4. Reject H if Z is in the critical regions,
reject region. This means that there is suffici
otherwise fail to reject
ent evidence to concl
H .
ude that the pens are not genuine.
A tyre company claims that the mean life of tyres that it produces is 11,000 mileswith a standard deviation of 552 miles. An independant supplier of tyres wants to investigatethe company's claim. A test on a random sample of 36 tyres from the company gave a mean life of 10,000 miles.Carry out a hypothesis test using a significance level of 5% to see if there is evidence to supportthe company's claim.
0
1
Null Hypotheses H : The company produces tyres with a mean life of 11,000 miles. 11,000 Al
Step 1. State the null an
ternative Hypothesis H :
d a
Th
lternative hyp
e company prod
otheses.
uces ty res whose mean life is not 11,000 miles. 11,000
0
x 10,000 11,000
Step 2. Convert the observed results into z units. Calculate the test stat
isti
Z = = = 10.87 55
c
236
.
n
Example 2
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
Step 3. Write down the critical values. a sketch also helps .
− 10.87 is in the Reject Region
Example 2
0 0
We reject the null hypSt
otheses as 10.87 is in tep 4. Reject H if Z is in the critical regions,
he reject region.We can conclude that there is e
otherwise fail to reject
vidence to suggestt h
H
.
at the company's claim is not true.
Your Turn
A neurologist is testing the effect of a drug on response time by injecting 36 rats with a unit dose of a new drug. The neurologist measures the response time of each rat to a stimulus. The neurologist know that the mean response time for rats not injected is 0.75 seconds.The mean of the 36 injected rats' response time is 0.6 seconds with a standard deviation of 0.2 seconds.Can you con clude that the drug has an effect on response time?
Question 1
A neurologist is testing the effect of a drug on response time by injecting 36 rats with a unit dose of a new drug. The neurologist measures the response time of each rat to a stimulus. The neurologist know that the mean response time for rats not injected is 0.75 seconds.The mean of the 36 injected rats' response time is 0.6 seconds with a standard deviation of 0.2 seconds.Can you con clude that the drug has an effect on response time?
0
1
Null Hypotheses H : The drug has no effect 0.75 seconds Alternative Hypothesis
S
H
tep
:
1. State the null and altern
The drug has an effec
ative hypot
t 0.75 second
heses.
s
0
x 0.6 0.75
Step 2. Convert the observed results into z
z
units. Calculate the test st
4.5 0.
atisti
26
c .
n 3
Question 1: Solution
Note we are approximating with as we don’t know .
Step 3. Write down the critical values. a sketch also helps .
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
0 0
We reject the null hypotheses as –4.5 is in tStep 4. Reject H if Z is in the critical regi
he reject region.We can conclude that there is
ons,otherwise fail to r
evidence to suggest th
ejec
at
t H .
the drug has an effect on reaction time.
–4.5 is in the Reject Region
Question 1: Solution
0
1
Null Hypotheses H : The students in this town did as well as all other students 51.5. Al
STEP 1. State the null an
ternative Hypothesis H :
d a
Th
lternative hyp
e students in
otheses.
this to wn did as well as all other students 51.5.
0
x 50 51.5
Step 2. Convert the observed results i
Z =
nto z units. Calculate the test
= = 1.24 8
statistic .
.5n 49
In an examination taken by a large number of students the mean mark was 51.5 and thestandard deviation was 8.5. In a random sample of 49 students in a particular town, it was found that among the students in this town the mean mark was 50.At the 5% level of significance, investigate if there is evidence to conclude that the students of this town did as well as students in general.
Example 2
Step 3. Write down the critical values. a sketch also helps .
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
−1.24 is in the Fail to Reject Region
0 0
We fail to reject the null hypotheses as 1.2Step 4. Reject H if Z is in the critical regi
4 is in the fail to reject region.We can concl
ons,otherwise fail to r
ude that there is evide
ejec
nce
t H .
to suggest that the students in this town did equally as well as students in general.
Example 2
The weights of newborn babies in Ireland is known to have a mean of 3 42kg and a standard deviation of 0 9kg. Assuming that the weights are normally distributed,a random sample of 500 babies whose mot
hers smoked heavily during pregnancy is taken.
If the mean weight of this sample is 3 28kg, can we conclude at the 5% significancethat heavy smoking of mothers during pregnancy has an effect on t he weight of their babies at birth?
0Null Hypotheses H : Heavy smoking during pregnancy by mothers has no effect on the weight
of their babie
STEP 1. State the null and alt
s at birth 3.42 kg
ernative hypo
Alternative
theses.
Hypot
1hesis H : Heavy smoking during pregnancy by mothers has an effect on the weightof their babies at birth 3.42 kg
0
x 3.28 3.42
Step 2. Convert the observed results into z u
nits. Calculate the test s
Z = = = 3.48 0
tatistic .
.9n 500
Example 3
Step 3. Write down the critical values. a sketch also helps .
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
−3.48 is in the Reject Region
0 0
We reject the null hypotheses as 3.48 is inStep 4. Reject H if Z is in the critical regi
the reject region. We can conclude that there
ons,otherwise fail to r
is evidence to suggest
ejec
th
t H .
at babies weights will be effected if their mothers smoke heavily during pregnancy.
Example 3
p-value at the 5% Significance Level
Instead of comparing the value of our test statistic to the critical values, we can get a specific p-valuefor our test statistic by looking up its value on the tables.The p-value measures the strength of the evidence in the data against the null hypothesis.The smaller the p-value, the less likely it is that the sample results come from a situationwhere the null hypothesis is true.
0 0
0
If p 0.05: Very strong evidence to reject the null hypotenuse H (if p is low H must go)If p 0.05: Very strong evidence to fail to reject the null hypotenuse H .
p - value
Medical consultants for large companies are concerned about the effects of stress on company executives. The mean systolic blood pressure for males aged 35 to 44 years of age is, according to national health statistics, 128 with a standard deviation of 15. A sample of 72 male executives in this age group ws selected from companies. Their mean blood pressure was 130.(i) Construct a 95% confidence interval for the mean systolic blood pressure for the executives. Interpert this interval.(ii) Carry out a hypothesis test using a significance level of 5% to see if there is evidence to suggest that the mean systolic blood pressure for executives is different to the national average. Clearly state the null and alternative hypothesis and your conclusion. Give a p-value for this hypothesis test and interpret this p-value.
Example 1
0
(ii) Carry out a hypothesis test.
Null Hypotheses H : The mean systolic blood pressure for males in thST
e EP 1. State the null and alternative hyp
age group 35-44
othese
s.
1
is the same as the national average. 128. Alternative Hypothesis H : The mean systolic blood pressure for males in the age group 35-44 is not the same as the national average. 128
0
x 130 128
Step 2. Convert the observed resul
t
s into z units. Calculate the test statis
Z = = = 1.13 15
n 72
tic .
(i) n 72, 15, x 130
95% confidence interval x 1.96n
15 95% confidence interval 130 1.96
72 130 3.46
[126.54, 133.46]
This means that the mean systolic blood pressure ( ) for all male executives aged 35 to 44in large companies lies in the range 126.54 to 133.46, with 95% certainty.This range includes the national average of 128.
Step 3. Write down the critical values. a sketch also helps .
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
1.13 is in the fail to Reject Region
0 0
We fail to reject the null hypotheses as 1.13Step 4. Reject H if Z is in the critical regi
is in the fail to reject region. We can concl
ons,otherwise fail to r
ude that there is evide
ejec
nce
t H .
to
suggest thatthe mean systolic blood pressure for males in the age group 35-44 is the same as the national average. 128.
The probability of getting a value > 1.13 is got from the tables is 1 0.8708 0.1292.The probability of getting a
Ste
value < 1.13 is
p 5. p value in a Two Tailed Test.
also 0.1292.The p-value is the s um of these two probabilities 2(0.1292) 0.2584This p-value is very high it is greater than 0.05 so this is greater evidence for failing to reject the null hypothesis. Two things to note:1. The p-value means: what is the probability that the observed value (130) is this far away from the value I expected to get (128) because of sheer randomness? So a p-value of 0·26 means in this case that there is a 26% chance that the blood pressure will be 2 or more units (130–128 = 2) away from the population mean for a sample of this size, just because of random variation in sampling. This is not enough evidence to reject the null hypothesis – the 5% level of significance means that we only reject the null hypothesis if the probability that the observed value is this far away from the value I expected to get because of sheer randomness is less than 5%. So, at 26%, the chance that this variation was due to randomness is too high.
2. The z-score is doubled to get the p-value because we are doing a two-tailed test.
A new diet is adertised with the claim that participants will loose an average of 4 kg during thefirst week on this diet. A random sample of 40 people on this diet showed a mean weight lossof 3.6 kg, with a standard deviation of 1 kg. (i) Calculate at a 95% confidence interval for the mean weight loss of all participants on this diet. Interpret this interval.(ii) Test the claim made in the advertisement for this diet at a 5% level of significance. Clearly state your null and alternative hypotheses and your conclusion. Give a p-value for this hypothesis test and interpret this p-value.
(i) n 40, s 1, x 3.5
s 95% confidence interval x 1.96
n1
95% confidence interval 3.6 1.9640
3.6 0.31
[3.29, 3.91]
This means that the mean weight loss ( ) lies in the range 3.29 kg to 3.91 kg, with 95% certainty.This range does not include the weight loss (4 kg) as advertised.
Example 2
0
(ii) Carry out a hypothesis test.
Null Hypotheses H : The average weight loss during the first week of tSTEP 1. State the null and alternative hypoth
his diet is 4 kg. 4 kg. Alternative Hypot
eses.
hesi 1s H : The average weight loss during the first week of this diet is not 4 kg. 4 kg.
0
x 3.6 4
Step 2. Convert the observed
results into z units. Ca
lculate t
Z = =
he test stati
= 2.53 s 1n 40
stic .
Example 2
Step 3. Write down the critical values. a sketch also helps .
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
−2.53 is in the Reject Region
0 0
We reject the null hypothesis as 2.53 is inStep 4. Reject H if Z is in the critical regi
the reject region. The average weight loss du
ons,otherwise fail to r
ring the first week of
ejec
thi
t H .
s diet is not 4 kg.We can conclude that there is evidence to suggest thatthe advertising claims seems not to be true.
Example 2
The probability of getting a value > 2.53 is got from the tables is 1 0.9943 0.006.The probability of getting a value < 2.53 i
Step 5. p value in a Two Tailed
s also 0.006.The p-value is th
Test.
e sum of these two probabilities 2(0.006) 0.012
“The p-value is very small – there is only a 1.2% chance that the deviation from the 4 kg stated is due to sampling variability. This is very strong evidence for rejecting the company’s claim.”
Example 2
Your Turn
The mean hourly wage in an EU country is €10. A sample of 35 individuals in the capital cityof the country has a mean hourly wage of €10.83 with a standard deviation of €3.35 per hour. (i) Construct a 95% confidence interval for the mean hourly wage in the capital city. Interpert this interval.(ii) Is there evidence to suggest that hourly wages for workers in the capital city are differen from the national hourly wage? Test the hypothesis using a 5% level of significance. Clearly state the null and alternative hypotheses and your conclusion. Give a p-value for this hypothesis test and interpret this p-value.
Question 1
The mean hourly wage in an EU country is €10. A sample of 35 individuals in the capital cityof the country has a mean hourly wage of €10.83 with a standard deviation of €3.35 per hour. (i) Construct a 95% confidence interval for the mean hourly wage in the capital city. Interpert this interval.(ii) Is there evidence to suggest that hourly wages for workers in the capital city are differen from the national hourly wage? Test the hypothesis using a 5% level of significance. Clearly state the null and alternative hypotheses and your conclusion. Give a p-value for this hypothesis test and interpret this p-value.
(i) n 35, s 3.35, x 10.83
s 95% confidence interval x 1.96
n3.35
95% confidence interval 10.83 1.9635
10.83 1.11
[9.72, 11.94]
This means hourly wage ( ) for workers in the capital city lies in the range €9.72 to €11.94with 95% certainty.This range includes the mean hourly rate for the country (€10).
Question 1: Solution
0
(ii) Carry out a hypothesis test.
Null Hypotheses H : The average hourly wage for a worker in the capSt
itep 1. State the null and alternative hyp
al is the same as that of a worker
othese
s.
1
in the rest of the country . €10. Alternative Hypothesis H : The average hourly wage for a worker in the capital is not the same as that of a worker in the rest of the country . €10.
0
x 10.83 10
Step 2. Convert the observed results i
Z =
nto z units.
=
Calculate the test statis
= 1.466s 3.35n 35
tic .
Question 1: Solution
Step 3. Write down the critical values. a sketch also helps .
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
1.466 is in the Fail to Reject Region
0 0
We fail to reject the null hypotheses as 1.4Step 4. Reject H if Z is in the critical regi
66 is in the fail to reject region. We can con
ons,otherwise fail to r
clude that there is evi
ejec
den
t H .
ce to suggest that, the hourly wage for workers in the capital is the same as the rest of the country.
Question 1: Solution
The probability of getting a value > 1.466 is got from the tables is 1 0.9286 0.0714The probability of getting a value < 1.466
Step 5. p value in a Two Tailed
is also 0.0714.The p-value is t
Test.
he s um of these two probabilities 2(0.0714) 0.1428This p-value is greater than 0.05. So this is greater evidence for failing to reject the null hypothesis.
Question 1: Solution
A machine filling bottles of natural mineral water is set to deliver 0.725 litres with a standard deviation of 0.01 litres. A sample of 50 bottles is checked and the mean quantity is found to be 0.721 litres.A the 5% level of siginificance, investigate if there is evidence to suggest that the mean of this sample is different from the expected mean of 0.725 litres?
Question 2
A machine filling bottles of natural mineral water is set to deliver 0.725 litres with a standard deviation of 0.01 litres. A sample of 50 bottles is checked and the mean quantity is found to be 0.721 litres.A the 5% level of siginificance, investigate if there is evidence to suggest that the mean of this sample is different from the expected mean of 0.725 litres?
0Null Hypotheses H : The mean volume delivered is the same as the expected volume.
STEP 1. State the null and a
0.725 li
lternative h
tres Alterna
ypothe
tive
ses.
Hypo
1thesis H : The mean volume delivered is not the same as the expected volume. 0.725 liters
0
x 0.721 0.725
Step 2. Convert the observed results into z units
Z = =
. Calculate the test statist
= 2.83 s 0.01n 50
ic .
Question 2: Solution
Step 3. Write down the critical values. a sketch also helps .
25% 25%
Reject H0Reject H0
-1.96 1.96
Reject RejectFail to Reject
Fail to Reject
− 2.83 is in the Reject Region
0 0
We reject the null hypotheses as 2.83 is inStep 4. Reject H if Z is in the critical regi
the reject region. We can conclude that there
ons,otherwise fail to r
is evidence to suggest
ejec
th
t H .
at, the mean volume delivered is not the same as the expected volume.
We can conclude that there is evidence to suggest that the mean is different from the expected mean
Question 2: Solution