Night 1. INFERENTIAL STATISTICS: USING THE SAMPLE STATISTICS TO INFER (TO) POPULATION PARAMETERS....

Night 1

INFERENTIAL STATISTICS: USING THE SAMPLE

STATISTICS TO INFER (TO) POPULATION PARAMETERS.

Modular Course 5 Summary or Descriptive Statistics:

Numerical and graphical summaries of data.

Making Decisions based on the Empirical Rule (Standard Normal

Curve)

0 1 2 3123

68%

95%

99.7%

1 2 3 1 2 3

Empirical Rule

68%

95%

99.7%

2 2

Most Important for Inferential Stats on our Syllabus

95%

95% of normal data lies within 2 standard deviations of the mean

95% of the IQ scores are within 2 standard deviations of the mean. 100 2(15) 100 30 130 100 2(15) 100 30 70

Solution

0 1 2 3123

68%

95%

99.7%

IQ scores are normally distributed with a mean of 100 and a standard deviation of 15.Use the Empirical Rule to show that 95% of IQ scores in the population are between 70 and 130.

Example 1

The number of sandwiches sold by a shop from 12 noon to 2 pm each day is normally distributed.The mean of the distribution was 42.6 sandwiches and a standard deviation of 8.2.Use the Empirical Rule to identify the range of values around the mean that includes 68% of the sale numbers.

68% of the sales are within 1 standard deviations of the mean . 42.6 1(8.2) 42.6 8.2 50.8 42.6 1(8.2) 42.6 8.2 34.4

Solution

Solution: 68% of the sale are between 34.4 and 50.8 sandwiches.

0 1 2 3123

68%

95%

99.7%

Example 2

Your Turn

Race - Week B&B prices per room (€)

56 75 60 70 80 70 50 90 80 75

75 50 75 50 70 60 65 60 50 70

84 70 70 60 60 70 70 70 40 60

70 80 60 65 55 50 70 80 50 55

(i) Calculate, correct to one decimal place, the mean and standard deviation of the data.(ii) Show that the emperical rule holds true for 1 standard deviation around the mean.(iii) Show that the emperical rule holds true for 2 standard deviations around the mean.

The table below shows the prices charged per room of 40 B&B houses in Galw ay.

0 1 2 3123

68%

95%

99.7%

Question

(i) Using calculator : Mean = 65.5, SD =11.2

(ii) Upper Range = Mean + 1(Standard Deviation) = 76.7 Lower Range = Mean - 1(Standard Deviation) = 54.4 Of the forty houses 13(68.05%) charge between €54.40 and €76.70Therefore aprox 68% of the prices lie between 1 standard deviation of the mean. (iii) Upper Range = Mean + 2(Standard Deviation) = 87.9 Lower Range = Mean - 2(Standard Deviation) = 43.1 Of the forty houses 38 (95%) charge between €43.10 and €87.90Therefore aprox 95% of the prices lie between 2 standard deviations of the mean.

0 1 2 3123

68%

95%

99.7%

Solution

For Leaving Cert we deal with two types of sampling:

1. Sample Proportion (Ordinary Level and Higher Level)

2. Sample Means ( Higher Level)

Sampling

Inferential Statistics:

We are usually unable to collect information about a total population.The aim of sampling is to draw reasonable conclusions about a population by obtaining information from a relatively small sample of that population.

When a sample from a population is selected we hope that the data we get represents the population as a whole.

To ensure this1. The sample must be random;

2. Every member of the population must have an equal chance of being included;

Sample 5

Sample 4

Sample 3

Sample 6

Sample 2Sample 1

Population

Sampling

A sample of 25 students in a school were asked if they spent over €5 on mobile phone calls over the last week. 10 students have spent over €5. The proportion of the sample of 25 who spent over €5 was Can we say that 40% of the students in the school (population) spent over €5?

The answer is no, (unless the sample size was the same as the population size), we can’t say for certain.

However we could say with a certain degree of confidence, if the sample was large enough and

representative then the proportion of the sample would be approximately the same as the proportion of the

population

Population Proportions and Margin of Error

How confident we are is usually expressed as a percentage.We already saw (from the empirical rule) that approximately 95% of the area of a normal curve lies within ± 2 standard deviations of the mean.

This means that we are 95% certain that the population proportion is within ±2 standard deviations of the sample proportion. ± 2 standard deviations is our margin of error and the percentage margin of error that this represents depends on the sample size.

If n = 1000 the percentage margin of error of ± 3%

95% is the confidence interval we are working with, but other confidence intervals also exist (e.g.90% and 99%) for which a different margin of error applies depending on sample size.

At 95% level of confidence1

Margin of Errorn

where n, is the sample size

Population Proportions and Margin of Error

PopulationProportion

�̂�−𝟏√𝒏

�̂�+𝟏√𝒏

95% confident that the population proportion is inside this confidence interval

95% confidence interval

Confidence interval for population proportion using Margin of Error

20 d

iffere

nt

95%

con

fid

en

ce in

terv

als

−𝟏√𝒏

+𝟏√𝒏

Question. A sample of 25 students in a school were asked if they spent over €5 on mobile phone calls over the last week. 10 students has spent over €5.

Showing a 95% confidence interval.

The proportion of the sample of 25 who spent over €5 was Margin of Error = = 0.2

95% of the time, the true population proportion is in the interval I made with my sampled proportion and the margin of error interval.

• As the sample size increases the margin of error decreases • A sample of about 50 has a margin of error of about 14% at 95% level

of confidence

• A sample of about 1000 has a margin of error of about 3% at 95% level of confidence

• The size of the population does not matter

• If we double the sample size (1000 to 2000) we do not get do not half the margin of error

• Margin of error estimates how accurately the results of a poll reflect the “true” feelings of the population

114.14%

50

Some Notes on Margin of Error

13.16%

1000

Sample Size Margin of Error

25 20%

64 12.5%

100 10%

256 6.25%

400 5%

625 4%

1111 3%

1600 2.5%

2500 2%

10000 1%

𝟏√𝒏

A company claims that 30% of people who eat their "Rice Crispy Bun" product really liked it. The confidence level is cited as 95%. In June an independant survey was carried out on 625 randomly selected people to see if they liked the "Rice Crispy Bun" product. Calculate the margin of error.

The result of the survey in June was that 125 liked the "Rice Crispy Bun" product. Accord

(i)(ii)

ing to the June survey would you say that at a 5% level of significance the company was correct in stating that 30% of people who eat their "Rice Crispy Bun" product really l

iked it?

1 1Margin of Error 0.04 4%

n 625The company claim 30% like the product.

The margin of error is plus or minus 0.04.

Solution

(i)

(ii) Reason :

Acording to the survey 125 out of 625 liked the product. 30% is outside the margin of error. 30

%

𝒑16% 24%�̂�−𝟏√𝒏

�̂�−𝟏√𝒏𝒑

Example 1

2

In a survey I want a margin of error of or 5% at 95% level of confidence.What sample size must I pick in order to achieve this?

Margin of Error 0.051

0.05n

1( 0.05)

n1

n0.0025

n 400

Solution

Example 2

Your Turn

A sweet company claims that 10% of the M&M's it produces are green.Students found that in a large sample of 500 M&M's 60 were green.

Calculate the margin of error. State weather 60 greens from 5

(i)(ii) 00 is an unusually high proportion of green M&M's if the claim by the company is assumed to b

e true.

Question

A sweet company claims that 10% of the M&M's it produces are green.Students found that in a large sample of 500 M&M's 60 were green.

Calculate the margin of error. State weather 60 greens from 5

(i)(ii) 00 is an unusually high proportion of green M&M's if the claim by the company is assumed to b

e true.

10%

𝒑7.5% 16.5%

60p 0.12 12%

500

10% is between 7.5% and 16.5% (inside the margin of error) so it seems not to be unusual.

�̂�−𝟏√𝒏

�̂�−𝟏√𝒏𝒑

1 1

Margin of Error 0.045 4.5%n 500

Solution

(i)

(ii) Reason :

Question: Solution

Testing claims about a population.

Null Hypothesis: The null hypothesis, denoted by H0 is a claim or statement about a population. We assume this statement is true until proven otherwise. (the null hypothesis means that nothing is wrong with the claim or statement).

Alternative Hypothesis: The alternative hypothesis, denoted by H1

is a claim or statement which opposes the original statement about a population.

Recognising the Concept of a Hypothesis Test

Courtroom Analogy to Teach Formal Language

• At the start of a trial it is assumed the defendant is not guilty.

• Then the evidence is presented to the judge and jury.

• The null hypothesis is that the defendant is not guilty (H0)

• If the jury reject the null hypothesis (H0), this means that they find the defendant guilty.

• If the jury fail to reject the null hypothesis (H0), this means that they find the defendant not guilty.

Often we need to make a decision about a population based on a sample.

1. Is a coin which is tossed biased if we get a run of 8 heads in 10 tosses? Assuming that the coin is not biased is called a NULL HYPOTHESIS

(H0) Assuming that the coin is biased is called an ALTERNATIVE

HYPOTHESIS (H1)

2. During a 5 minute period a new machine produces fewer faulty parts than an old machine.

Assuming that the new machine is no better than the old one is called a NULL HYPOTHESIS (H0)

Assuming that the new machine is better than the old one is called an ALTERNATIVE HYPOTHESIS (H1)

3. Does a new drug for Hay-Fever work effectively? Assuming that the new drug does not work effectively is called a NULL

HYPOTHESIS (H0) Assuming that the new drug does work effectively called an

ALTERNATIVE HYPOTHESIS (H1)

Population

Proportion

�̂�−𝟏√𝒏

�̂�+𝟏√𝒏

Claim %(H0) is inside

Claim %(H0) is inside

Claim %(H0) is outside

Claim %(H0) is outside

95% confidence interval

Reject RejectFail to Reject

Fail to Reject

Hypothesis test on a population proportion using Margin of Error

Go Fast Airlines provides internal flights in Ireland, short haul flights to Europe and long haul flights to America and Asia. Each month the company carries out a survey among 1000 passengers. The company repeatedly advertises that 70% of their customers are satisfied with their overall service. 664 of the sample stated they were satisfied with the overall service.

Go Fast Airlines

Example 1

Example 1

Go Fast Airlines provides internal flights in Ireland, short haul flights to Europe and long haul flights to America and Asia. Each month the company carries out a survey among 1000 passengers. The company repeatedly advertises that 70% of their customers are satisfied with their overall service. 664 of the sample stated they were satisfied with the overall service. Would you say that the company were correct in saying that 70% of their customers were satisfied?

State the null hypotheses and state your conclusions clearly. Null Hypothesis: The proportion of passengers who are satisfied with the service is unchanged 70%. p = 0.7Alternative Hypothesis: The proportion of passengers who are satisfied with the service is not 70%. p 0.7

Evidence:Sample Proportion = Margin of Error =

ConclusionThe 70% is outside the range 63.24% to 69.56% of our confidence interval.There is sufficient evidence to reject the claim that the percentage of passengers who are happy with the service is 70% at the 5% level of significance.

Possible Actions: Change the advertisement from 70% to 65%.Meet with staff to come up with suggestions about how to improve the level of satisfaction.Do a further survey to find out more detail about why the level of satisfaction has changed.

𝒑70%

63.24% 69.56%Reject

�̂�−𝟏√𝒏

�̂�−𝟏√𝒏𝒑

Your Turn

It is generally agree that 40% of the voting public are in favour of a change of government. A survey was carried out on 900 randomly selected people to see if there was a change in support for the government. The result was that 42% are now in favour of a change of government.

Calculate the margin of error.State the null and alternative hypothesis. At

(i)(ii) (iii)

0

a 5% level of significance, would you accept or reject the null hypothesis? Give a reason for your conclusion.


n 900Null hypothesis, H : "There is no change

Solution

(i)

(ii)

0

0

in the support for the government" Alternative hypothesis, H : "There is a change in the support for the government"

We Fail to Reject (Accept) H the null hypothesis.S

(iii) Reason : ee the diagram below. 40% is inside the margin of error.

Question 1

It is generally agree that 40% of the voting public are in favour of a change of government. A survey was carried out on 900 randomly selected people to see if there was a change in support for the government. The result was that 42% are now in favour of a change of government.

Calculate the margin of error.State the null and alternative hypothesis. At

(i)(ii) (iii)

0

a 5% level of significance, would you accept or reject the null hypothesis? Give a reason for your conclusion.


n 900Null hypothesis, H : "There is no change

Solution

(i)

(ii)

0

0

in the support for the government" Alternative hypothesis, H : "There is a change in the support for the government"

We Fail to Reject (Accept) H the null hypothesis.S

(iii) Reason : ee the diagram below. 40% is inside the margin of error.

�̂�−𝟏√𝒏

�̂�−𝟏√𝒏𝒑

Question 1: Solution

40%

𝒑39% 45%Fail to Reject

RTÉ claim that 60% of all viewers watch the Late Late Show every Friday night. An independent survey was carried out on 400 randomly selected viewers to see if the claim were true. The result of the survey was that 180 were watching the Late Late Show.I. Calculate the margin of error.II. State the Null and Alternative Hypothesis.III. Would you accept or reject the Null Hypothesis according to this

survey? Give a reason for your conclusion.

Question 2

I. Margin of Error = = 0.05 = 5%

II. Null hypothesis : 60% of viewers watch the Late Late Show. Alternative hypothesis : 60% of viewers do not watch the Late Late Show. = 0.45 = 45%

iii. There is sufficient evidence, according to the survey, Reject the Null Hypotheses. Reason: 60% is outside the confidence interval.

�̂�−𝟏√𝒏

�̂�−𝟏√𝒏𝒑


60%

𝒑40% 50%Rejec

t

1 2 3 1 2 3

Empirical Rule

68%

95%

99.7%

What about 1·5 std devs or 0·8 std devs?

Night 2

Different sets of data have different means and standard deviations but any that are normally distributed have the same bell-shaped normal distribution type of curves.

Normal Distribution Curve Standard Normal Curve

In order to avoid unnecessary calculations and graphing the scale of a Normal Distribution curve is converted to a standard scale called the z score or standard unit scale.

Normal Distribution to Standard Normal Distribution

4 7 10

13

16

19

22

242

254

266

278

290

302

314

–3 –2 –1 0 1 2 3

Normal Distributions

Standard Normal Distribution

133

27812

01

21z

21If 0 and 1 we would plot e

2This graph gives the Standard Normal Graph with a standardised scale.

0

1 2

2

1

22

33

z scores

Standard Normal Distribution

The area between the Standard Normal Curve and the z axis between and is 1.

21z

2

Total area under the curve

1P( z ) e dz

2

1

33

z – scores define the position of a score in relation to the mean using the standard deviation as a unit of measurement.

z – scores are very useful for comparing data points in different distributions.

The z – score is the number of standard deviations by which the score departs from the mean.This standardises the distribution.

xz

x is a data point is the population mean

is the standard deviation of the population

Standard Units (z – scores)

1z t2

For a given z, the table gives

1P(Z z) e dt

2

P(Z 1 31) can be read from the tables directly

P(Z 1 31) 0 9049 90.49%

Using the tables find P(Z 1 31).

Reading z – values From TablesExample 1

–3 –2 –1 0 1 2 31.31

Pg

. 3

6P

g.

37

P(Z 1 32) 1 PP(

(Z 1 32)P(Z 1 32) 1 0 9066 0

Z z) is equal to 1 P(Z

0934 9.

)

%

z

34

Using the tables find P(Z 1 32)

1.32–3 –2 –1 0 1 2 3

P(Z z) P(Z z)1 P(Z z)

The table only gives value to the left of z, butthe fact that the total area under the curve equals 1, allows us to use, P(Z z) 1 P(Z z)

z0

Example 2

P(Z z)

Pg

. 3

6P

g.

37

Using the tables find P(Z 0 74).

The tables only work for positive values but as the curve is symmetrical about z 0P(Z 0 74) P(Z 0 74)P(Z 0 74) 1 P(Z 0 74)P(Z 0 74) 1 0 7704 = 0 2296 22.96%

–3 –2 –1 0 1 2 3

–0.74

0–z z

P(Z z)

Both areas are the same and hence both probabilities are equal as the curve

is symmetrical about the mean, 0.

Example 3

P(Z z)

Pg

. 3

6P

g.

37

Using the tables find P( 1 32 z 1 29)

P( 1 32 z 1 29) Area to the Left of 1 29 Area to the left of 1.32

P(z 1 29) 1 P(z 1 32)

0 9015 [1 0 9066] = 0 8081 80.81%

1.29–1.32

–3 –2 –1 0 1 2 3

3 –3 –2 –1 0 1 2 3–1.32

–3 –2 –1 0 1 21.29

Example 4

Pg

. 3

6P

g.

37

Your Turn

The amounts due on a mobile phone bill in Ireland are normally distributed with a mean of €53 and a standard deviation of €15. If a monthly phone bill is chosen at random, find the probability that th

1 2

1 2

1 2

e amount due is between €47 and €74.

x xz z

47 53 74 53z z

15 15z 0 4 z 1 4 P( 0 4 Z 1 4)

P( 0 4 Z 1 4) P(Z 1 4) 1 P(Z 0 4)

P( 0 4 Z 1 4) 0 9192 [1 0 65

Solution

54]

P( 0 4 Z 1 4) 0 5746

Question 1

The amounts due on a mobile phone bill in Ireland are normally distributed with a mean of €53 and a standard deviation of €15. If a monthly phone bill is chosen at random, find the probability that th

1 2

1 2

1 2

e amount due is between €47 and €74.

x xz z

47 53 74 53z z

15 15z 0 4 z 1 4 P( 0 4 Z 1 4)

P( 0 4 Z 1 4) P(Z 1 4) 1 P(Z 0 4)

P( 0 4 Z 1 4) 0 9192 [1 0 65

Solution

54]

P( 0 4 Z 1 4) 0 5746

8–3

23–2

38–1

530

681

832

983

741.4

47–0.4


The mean percentage achieved by a student in a statistic exam is 60%. The standard deviation of the exam marks is 10%.

What is the probability that a randomly selected student scores above 80%?

(i)(ii) What is the probability that a randomly selected student scores below 45%?

What is the probability that a randomly selected student scores between 50% and 75%? Suppose you were sitting this

(iii)(iv) exam and you are offered a prize for getting a mark which is

greater than 90% of all the other students sitting the exam?What percentage would you need to get in the exam to win the prize?

Solution

(i)

x 80 60z 2

10P(Z 2) 1 P(Z 2)P(Z 2) 1 0.9772 0.0228 2.28%

x 45 60z 1.5

10P(Z 1.5) P(Z 1.5) 1 P(Z 1.5)

P(Z 1.5) 1 0.9332 0.0668 6.68%

(ii)

30–3

40–2

50–1

600

701

802

903

30–3

40–2

50–1

600

701

802

903

45–1.5

Question 2

The mean percentage achieved by a student in a statistic exam is 60%. The standard deviation of the exam marks is 10%.

What is the probability that a randomly selected student scores above 80%?

(i)(ii) What is the probability that a randomly selected student scores below 45%?

What is the probability that a randomly selected student scores between 50% and 75%? Suppose you were sitting this

(iii)(iv) exam and you are offered a prize for getting a mark which is

greater than 90% of all the other students sitting the exam?What percentage would you need to get in the exam to win the prize?

Solution

(i)

x 80 60z 2

10P(Z 2) 1 P(Z 2)P(Z 2) 1 0.9772 0.0228 2.28%

x 45 60z 1.5

10P(Z 1.5) P(Z 1.5) 1 P(Z 1.5)

P(Z 1.5) 1 0.9332 0.0668 6.68%

(ii)

30–3

40–2

50–1

600

701

802

903

30–3

40–2

50–1

600

701

802

903

45–1.5


1 2

1 2

1 2

x xz z

50 60 75 60z z

10 10z 1 z 1.5

P( 1 Z 1 5) P(Z 1 5) 1 P(Z 1)

P( 1 Z 1 5) 0.9332 [1 0.8413]P( 1 Z 1 5) 0.7745

From the tables an answer for an area

(iii)

(iv) of 90% (0.9) 1.28 Z 1.28x

z

x 601.28 x 72.8 marks

10

30–3

40–2

50–1

600

701

802

903

751.5

30–3

40–2

50–1

600

701

802

903

72.81.28


0 1.96 1.96

95%2.5%

2.5%

Pg

. 3

7P

g.

36

For Higher Level Leaving Cert use z scores

𝑳𝑪𝑯𝑳 :𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆𝑳𝒊𝒎𝒊𝒕𝒔=±𝟏 .𝟗𝟔√ �̂� (𝟏−�̂� )𝒏95% confidence interval

Population

Proportion �̂�−𝟏 .𝟗𝟔√ �̂� (𝟏− �̂�)𝒏

�̂�+𝟏 .𝟗𝟔√ �̂� (𝟏− �̂�)𝒏

95% confident that the population proportion is inside this confidence interval

Confidence interval for population proportion

Example 1

Sample Proportion =

Confidence Limits =

Skygo provides Wifi in the Galway area . In March the company carries out a survey among 625 of its costumers. The company advertises that 60% of their customers were satisfied with their download speeds. 370 of the sample stated they were satisfied with their download speed time. Create a 95% confidence interval based on your sample.

55.36% 63.04% 𝒑95% confidence interval

�̂�−𝐸𝑟𝑟𝑜𝑟 �̂�+𝐸𝑟𝑟𝑜𝑟𝒑

Your Turn

The Sunday Independent reports that the government's approval rating is at 65%. The paper states that the poll is based on a random sample of 972 voters and that the margin of error is 3%Show that the pollsters used a 95% level of confidence.

Question 1:

The Sunday Independent reports that the government's approval rating is at 65%. The paper states that the poll is based on a random sample of 972 voters and that the margin of error is 3%Show that the pollsters used a 95% level of confidence.

Solution Confidence Limits=

0.03 =

0.03 =

=

=1.96

Therefore they are using a 95% level of confidence.


It is known that 30% of a certain kind of apple seed will germinate. In an experiment 85 out of 300 seeds germinated. Construct a 95% confidence interval for the sample proportion.

Question 2

Sample Proportion =

Confidence Limits =

𝒑95% confidence interval

�̂�−𝐸𝑟𝑟𝑜𝑟 �̂�+𝐸𝑟𝑟𝑜𝑟𝒑

0.232<𝑝<0.334

Sample Means

Sample means

The data below are the heights in cm, of a population of 100, 15 year old students

170

174

174

164

152

155

160

172

156

163

182

167

158

154

167

140

143

167

178

165

176

165

166

177

148

147

166

184

165

162

185

171

168

173

175

160

167

172

179

153

180

172

164

178

153

152

167

165

174

145

155

150

150

158

162

166

163

159

148

170

154

181

155

165

180

168

158

158

175

176

166

165

170

175

175

158

160

177

166

180

165

165

166

168

180

157

153

150

179

157

161

152

161

144

174

172

165

157

174

159

xFrom the list above the Mean of the Population

n164 72

xFrom the list above the Standard Deviation of the Population

n

2

10 21

Slide60

Slide61

It does not matter if the original distribution of the sample means will always be normally distributed. Use Java Applets.

A single sample of 5 data points.

The black arrows are the data points. The mean of the sample is the red dot

A single sample of 10 data points.

Naturally if we choose a sample size of 100 (original population size) the mean of the sample will be that same as the mean of the population.

As the sample size increases the standard error will decrease. Why? ……………

x

x

For a sample size of 30

or m1.

The sample means are normally distributed

The sample means are normally distributed

x

x


or m

or sn n

1.

2.

x


or sn n

Sample 5

Sample 4

Sample 3

Sample 6

Sample 2Sample 1

Population

Population

Population

Large Sample

Sample Means

Mean

Standard Deviation (Standard Error)

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

KEY IDEA CLICK LINK BELOW

Summary

Population

Large Sample

Sample Means

Mean

Standard Deviation (Standard Error)

x

In practice, from the table above, we can say that for

1. The sample means are normally distributed.

2. The mean of the sample means is the same as the population mean.

3. The standard devi

n 30

at

x

ion of the sample means is equal to n

this is called the standard errorn

.

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

095- z1 z10025 0025

In the Standard Normal Distribution we want the values of z1 such that 95% of the population lies in the interval - z1 ≤ z ≤ z1

Therefore in a Normal Distribution 95% of the population lies within 1∙96standard deviations of the mean.95% of the population lies within 1∙96 of μ( the population mean)

Slide71

P(z z

z and z1

1 1

) 0 95 0 025 0 975

1 96 1 96

x x x x

x x

As the confidence limits are n n

1 96 1 96

1 96

(i) The confidence limits are x 1 96n

2 2 4 5 1 96

250 4 5 1 96(0 139) 4 5 1 96(0 139) 4 23 4 77 This means that we can s

ay with 95% confidence that the mean age of all cars in the population is between 4 23 years and 4 77 years.

2

(ii) 1 96 0 3n

2 2 1 96 0 3

n1 96 2 2

n 0 3

14 373

n 14 373 207cars

A random sample of 250 cars were taken and the mean age of the cars was 4 5 years and the standard diviation was 2 2 years.(i) Find the 95% confidence interval for the mean age of all cars.(ii) What

size sample is required to estimate the mean age, with 95% confidence

within 0.3 years.

Example 1

A random sample of 144 male students in a large university was taken and their heights measured.The mean height was 175 cm. The standard deviation of all the male students in the universitywas 9 cm.(i) Give a 95% confidence intreval for the heights of all the male students.(ii) Show that the confidence interval would decrease if a sample size was 225 instead of 144.

(i) n 144, x (mean of the sample) 175, (standard deviation of the population) = 9, (population mean) is unknown.

x

x

We calculate the standard error of the mean using n

9 0.75

144

As the sample size is large the best possible estimated value of is x which is 175 cm.Now we have to give a range of values in which the true population mean ( ) lies.

This will be with 95% level of certainty.

x 1.96 x 1.96 n n

175 1.96(0.75) 175 1.96(0.75) 173.53 176.47The true population mean lies within the range 173.53 cm to 176.47 cm with 95% certainty.

Example 2

x

(ii) If a sample of 225 were taken the standard error would be 9

0.6225

x 1.96 x 1.96 n n

175 1.96(0.6) 175 1.96(0.6) 173.82 176.18The true population mean lies within the range 173.82 cm to 176.18 cm with 95% certainty. The confidence interval has decreased.

This is narrower than the previous confidence interval. As you incerase the sample size you decrease the width of the confidence interval.

A study addressed the issue of whether pregnant women can correctly guess the sex of their baby. Among 104 recruited subjects, 57 correctly guessed the sex of the baby Use these sample data to test the claim that the success rate of such guesses is no different from the 50% success rate expected with random chance guesses. Use a 5% significance level.(based on data from “Are Women Carrying ‘Basketballs’ Really Having Boys? Testing Pregnancy Folklore,” by Perry, DiPietro, and Constigan, Birth, Vol. 26, No. 3)

Solution:The original claim is that the success rate is no different from 50%.

0

1

0.5

0.5

57ˆ 0.548104

ˆ 0.548 0.500.98

(1 ) (0.5)(0.5)/104

At 5% level of significance the critical values are 1.96

As 0.98 is between 1.96 and 1.96 we fail to reject the null hypthesis.

H

H

p

p pz

p p n

There is not sufficient evidence to warrant rejection of the claim that women who guess the sex of their babies have a success rate equal to 50%.

Your Turn

A survey was carried out to find the weekly rental costs of holiday apartments in a certain country.A random sample of 400 apartments was taken. The mean of the sample was €320 and the standard deviation was €50.Form a 95% confidence interval for the mean weekly rental costs of holiday apartments in that country.

The confidence limits are x 1 96n

50 320 1 96

400 320 1 96(2 5) 320 1 96(2 5) 315 1 324 9 Between €315 10 and €324 90

2005 LC. HL . Q 9 ( c )

Night 3

Hypothesis Testing

Slide79

Often we need to make a decision about a population based on a sample.

In a trial you are presumed innocent until after the trial? Assuming that an accused person is innocent ( nothing has

happened) is called a NULL HYPOTHESIS (H0) Assuming that an accused person is not innocent called an

ALTERNATIVE HYPOTHESIS (H1)

1. Is a coin which is tossed biased if we get a run of 8 heads in 10 tosses? Assuming that the coin is not biased is called a NULL HYPOTHESIS (H0) Assuming that the coin is biased is called an ALTERNATIVE HYPOTHESIS

(H1)

2. During a 5 minute period a new machine produces fewer faulty parts than an old machine.

Assuming that the new machine is no better than the old one is called a NULL HYPOTHESIS (H0)

Assuming that the new machine is better than the old one is called an ALTERNATIVE HYPOTHESIS (H1)

3. Does a new drug for Hay-Fever work effectively? Assuming that there is no difference between the new drug and the

currentdrug called a NULL HYPOTHESIS. ( H0 )

Assuming that the new drug is better than the current most popular drug is called an ALTERNATIVE HYPOTHESIS. ( H1 )

A Two Tailed Test.

The critical values for a 5% level of significance

z = 1∙96 or z = 1∙96

Slide81

25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject

Testing the Null Hypothesis using z-values

The statistical method used to determine whether H0 is true or not is called HYPOTHESIS TESTING.Statisticians speak of “not accepting or accepting H0 at a certain level”. This level is called the LEVEL OF SIGNIFICANCE. ( 5% level of significance is on the syllabus).

If the value of z lies outside the range 1∙96 < z < 1∙96 (critical region) we reject H0 .

25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject

Testing the Null Hypothesis using z-values

x

If we take a large sample of size n from a population with a mean of and a standard deviation of .

We have to calculate the mean of the sample x. ( x when we are dealing with large samples)

We ca

x x

0

n also calculate (s) by using . n

We want to test the hypothesis that the sample comes from a population with a paticular value of called

0

0

Null Hypotheses: Alternative Hypothesis: Note 1:

Ste

Not using >0 or <0. No

p 1. State the null and alter

direction stipulated.

native hypothe

There

ses.

fore this is a two tailed test. (Only Two Tailed Test on for Leaving Cert.)Note 2: Null Hypothesis always has an equal sign and uses population parameters

Testing hypotheses about a population mean (large samples ) .

XThe test statistic is a Standard Normal Z sco

Step 2. Convert the observed results into z units

re with Z = .

As we are dealing with the s

. Ca

ampl

lculate the test statist

ing distribution of t

ic .

he me

0

an

x Z = .

n(This is the difference between the value we have observed from our sample and the hypothesised value from the population divi

de by the standard error)

Observed Value Hypothesised Value Z =

Standard Error

0 0

Step 3. Write down the critical values. a sketch also helps .

Step 4. Reject H if Z is in the criticOnce we have the value

al regions,otherwise faiof Z we compare it to ou

l to reject r critica

H .l values and decide

wheather or not to reject the null hypotheses.

1. Write down the null hypothesis H0 and the alternative hypothesis H1

2. Convert the observed results into z units. (Calculate the test statistic).

3. Write down the critical values. (a sketch also helps).

4. Reject H0 if z is in the critical regions, otherwise fail to reject H0.

Review of the steps involved in Hypothesis Testing:

A company manafactures pens with a mean writing life of 500 hours and a standard deviation of 10 hours. A retailer examines a sample of 81 pens froma supplier who claims to only sell pens from this company and finds their meanlife is 497 hours. Are these pens genuine products from the company?

0

1

Null Hypotheses H : The sample of pens are genuine products from the compny. 500 Alternati

Step 1. State the null an

ve Hypothesis H : The s

d a

amp

lternative hyp

le of pens are

otheses.

not ge nuine products from the compny. 500Note: If not given a Level of Significance we must write it down. 5% (only level on for Leaving Cert.)

0

x 497 500

Step 2. Convert the observed results

into z units. Calculate the test statistic .

Z = = = 2.7 10

n 81

Example 1

25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject


− 2.7 is in the Reject Region

Example 1

0 0

We reject the null hypSt

otheses as 2.7 is in theep 4. Reject H if Z is in the critical regions,

reject region. This means that there is suffici

otherwise fail to reject

ent evidence to concl

H .

ude that the pens are not genuine.

A tyre company claims that the mean life of tyres that it produces is 11,000 mileswith a standard deviation of 552 miles. An independant supplier of tyres wants to investigatethe company's claim. A test on a random sample of 36 tyres from the company gave a mean life of 10,000 miles.Carry out a hypothesis test using a significance level of 5% to see if there is evidence to supportthe company's claim.

0

1

Null Hypotheses H : The company produces tyres with a mean life of 11,000 miles. 11,000 Al

Step 1. State the null an

ternative Hypothesis H :

d a

Th

lternative hyp

e company prod

otheses.

uces ty res whose mean life is not 11,000 miles. 11,000

0

x 10,000 11,000

Step 2. Convert the observed results into z units. Calculate the test stat

isti

Z = = = 10.87 55

c

236

.

n

Example 2

25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject



Example 2

0 0

We reject the null hypSt

otheses as 10.87 is in tep 4. Reject H if Z is in the critical regions,

he reject region.We can conclude that there is e

otherwise fail to reject

vidence to suggestt h

H

.

at the company's claim is not true.

Your Turn

A neurologist is testing the effect of a drug on response time by injecting 36 rats with a unit dose of a new drug. The neurologist measures the response time of each rat to a stimulus. The neurologist know that the mean response time for rats not injected is 0.75 seconds.The mean of the 36 injected rats' response time is 0.6 seconds with a standard deviation of 0.2 seconds.Can you con clude that the drug has an effect on response time?

Question 1

A neurologist is testing the effect of a drug on response time by injecting 36 rats with a unit dose of a new drug. The neurologist measures the response time of each rat to a stimulus. The neurologist know that the mean response time for rats not injected is 0.75 seconds.The mean of the 36 injected rats' response time is 0.6 seconds with a standard deviation of 0.2 seconds.Can you con clude that the drug has an effect on response time?

0

1

Null Hypotheses H : The drug has no effect 0.75 seconds Alternative Hypothesis

S

H

tep

:

1. State the null and altern

The drug has an effec

ative hypot

t 0.75 second

heses.

s

0

x 0.6 0.75

Step 2. Convert the observed results into z

z

units. Calculate the test st

4.5 0.

atisti

26

c .

n 3


Note we are approximating with as we don’t know .


25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject

0 0

We reject the null hypotheses as –4.5 is in tStep 4. Reject H if Z is in the critical regi

he reject region.We can conclude that there is

ons,otherwise fail to r

evidence to suggest th

ejec

at

t H .

the drug has an effect on reaction time.

–4.5 is in the Reject Region


0

1

Null Hypotheses H : The students in this town did as well as all other students 51.5. Al

STEP 1. State the null an

ternative Hypothesis H :

d a

Th

lternative hyp

e students in

otheses.

this to wn did as well as all other students 51.5.

0

x 50 51.5

Step 2. Convert the observed results i

Z =

nto z units. Calculate the test

= = 1.24 8

statistic .

.5n 49

In an examination taken by a large number of students the mean mark was 51.5 and thestandard deviation was 8.5. In a random sample of 49 students in a particular town, it was found that among the students in this town the mean mark was 50.At the 5% level of significance, investigate if there is evidence to conclude that the students of this town did as well as students in general.

Example 2


25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject

−1.24 is in the Fail to Reject Region

0 0

We fail to reject the null hypotheses as 1.2Step 4. Reject H if Z is in the critical regi

4 is in the fail to reject region.We can concl


ude that there is evide

ejec

nce

t H .

to suggest that the students in this town did equally as well as students in general.

Example 2

The weights of newborn babies in Ireland is known to have a mean of 3 42kg and a standard deviation of 0 9kg. Assuming that the weights are normally distributed,a random sample of 500 babies whose mot

hers smoked heavily during pregnancy is taken.

If the mean weight of this sample is 3 28kg, can we conclude at the 5% significancethat heavy smoking of mothers during pregnancy has an effect on t he weight of their babies at birth?

0Null Hypotheses H : Heavy smoking during pregnancy by mothers has no effect on the weight

of their babie

STEP 1. State the null and alt

s at birth 3.42 kg

ernative hypo

Alternative

theses.

Hypot

1hesis H : Heavy smoking during pregnancy by mothers has an effect on the weightof their babies at birth 3.42 kg

0

x 3.28 3.42

Step 2. Convert the observed results into z u

nits. Calculate the test s

Z = = = 3.48 0

tatistic .

.9n 500

Example 3


25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject

−3.48 is in the Reject Region

0 0

We reject the null hypotheses as 3.48 is inStep 4. Reject H if Z is in the critical regi

the reject region. We can conclude that there


is evidence to suggest

ejec

th

t H .

at babies weights will be effected if their mothers smoke heavily during pregnancy.

Example 3

p-value at the 5% Significance Level

Instead of comparing the value of our test statistic to the critical values, we can get a specific p-valuefor our test statistic by looking up its value on the tables.The p-value measures the strength of the evidence in the data against the null hypothesis.The smaller the p-value, the less likely it is that the sample results come from a situationwhere the null hypothesis is true.

0 0

0

If p 0.05: Very strong evidence to reject the null hypotenuse H (if p is low H must go)If p 0.05: Very strong evidence to fail to reject the null hypotenuse H .

p - value

Medical consultants for large companies are concerned about the effects of stress on company executives. The mean systolic blood pressure for males aged 35 to 44 years of age is, according to national health statistics, 128 with a standard deviation of 15. A sample of 72 male executives in this age group ws selected from companies. Their mean blood pressure was 130.(i) Construct a 95% confidence interval for the mean systolic blood pressure for the executives. Interpert this interval.(ii) Carry out a hypothesis test using a significance level of 5% to see if there is evidence to suggest that the mean systolic blood pressure for executives is different to the national average. Clearly state the null and alternative hypothesis and your conclusion. Give a p-value for this hypothesis test and interpret this p-value.

Example 1

0

(ii) Carry out a hypothesis test.

Null Hypotheses H : The mean systolic blood pressure for males in thST

e EP 1. State the null and alternative hyp

age group 35-44

othese

s.

1

is the same as the national average. 128. Alternative Hypothesis H : The mean systolic blood pressure for males in the age group 35-44 is not the same as the national average. 128

0

x 130 128

Step 2. Convert the observed resul

t

s into z units. Calculate the test statis

Z = = = 1.13 15

n 72

tic .

(i) n 72, 15, x 130

95% confidence interval x 1.96n

15 95% confidence interval 130 1.96

72 130 3.46

[126.54, 133.46]

This means that the mean systolic blood pressure ( ) for all male executives aged 35 to 44in large companies lies in the range 126.54 to 133.46, with 95% certainty.This range includes the national average of 128.


25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject

1.13 is in the fail to Reject Region

0 0


is in the fail to reject region. We can concl


ude that there is evide

ejec

nce

t H .

to

suggest thatthe mean systolic blood pressure for males in the age group 35-44 is the same as the national average. 128.

The probability of getting a value > 1.13 is got from the tables is 1 0.8708 0.1292.The probability of getting a

Ste

value < 1.13 is

p 5. p value in a Two Tailed Test.

also 0.1292.The p-value is the s um of these two probabilities 2(0.1292) 0.2584This p-value is very high it is greater than 0.05 so this is greater evidence for failing to reject the null hypothesis. Two things to note:1. The p-value means: what is the probability that the observed value (130) is this far away from the value I expected to get (128) because of sheer randomness? So a p-value of 0·26 means in this case that there is a 26% chance that the blood pressure will be 2 or more units (130–128 = 2) away from the population mean for a sample of this size, just because of random variation in sampling. This is not enough evidence to reject the null hypothesis – the 5% level of significance means that we only reject the null hypothesis if the probability that the observed value is this far away from the value I expected to get because of sheer randomness is less than 5%. So, at 26%, the chance that this variation was due to randomness is too high.

2. The z-score is doubled to get the p-value because we are doing a two-tailed test.

A new diet is adertised with the claim that participants will loose an average of 4 kg during thefirst week on this diet. A random sample of 40 people on this diet showed a mean weight lossof 3.6 kg, with a standard deviation of 1 kg. (i) Calculate at a 95% confidence interval for the mean weight loss of all participants on this diet. Interpret this interval.(ii) Test the claim made in the advertisement for this diet at a 5% level of significance. Clearly state your null and alternative hypotheses and your conclusion. Give a p-value for this hypothesis test and interpret this p-value.

(i) n 40, s 1, x 3.5

s 95% confidence interval x 1.96

n1

95% confidence interval 3.6 1.9640

3.6 0.31

[3.29, 3.91]

This means that the mean weight loss ( ) lies in the range 3.29 kg to 3.91 kg, with 95% certainty.This range does not include the weight loss (4 kg) as advertised.

Example 2

0


Null Hypotheses H : The average weight loss during the first week of tSTEP 1. State the null and alternative hypoth

his diet is 4 kg. 4 kg. Alternative Hypot

eses.

hesi 1s H : The average weight loss during the first week of this diet is not 4 kg. 4 kg.

0

x 3.6 4

Step 2. Convert the observed

results into z units. Ca

lculate t

Z = =

he test stati

= 2.53 s 1n 40

stic .

Example 2


25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject

−2.53 is in the Reject Region

0 0

We reject the null hypothesis as 2.53 is inStep 4. Reject H if Z is in the critical regi

the reject region. The average weight loss du


ring the first week of

ejec

thi

t H .

s diet is not 4 kg.We can conclude that there is evidence to suggest thatthe advertising claims seems not to be true.

Example 2

The probability of getting a value > 2.53 is got from the tables is 1 0.9943 0.006.The probability of getting a value < 2.53 i

Step 5. p value in a Two Tailed

s also 0.006.The p-value is th

Test.

e sum of these two probabilities 2(0.006) 0.012

“The p-value is very small – there is only a 1.2% chance that the deviation from the 4 kg stated is due to sampling variability. This is very strong evidence for rejecting the company’s claim.”

Example 2

Your Turn

The mean hourly wage in an EU country is €10. A sample of 35 individuals in the capital cityof the country has a mean hourly wage of €10.83 with a standard deviation of €3.35 per hour. (i) Construct a 95% confidence interval for the mean hourly wage in the capital city. Interpert this interval.(ii) Is there evidence to suggest that hourly wages for workers in the capital city are differen from the national hourly wage? Test the hypothesis using a 5% level of significance. Clearly state the null and alternative hypotheses and your conclusion. Give a p-value for this hypothesis test and interpret this p-value.

Question 1

The mean hourly wage in an EU country is €10. A sample of 35 individuals in the capital cityof the country has a mean hourly wage of €10.83 with a standard deviation of €3.35 per hour. (i) Construct a 95% confidence interval for the mean hourly wage in the capital city. Interpert this interval.(ii) Is there evidence to suggest that hourly wages for workers in the capital city are differen from the national hourly wage? Test the hypothesis using a 5% level of significance. Clearly state the null and alternative hypotheses and your conclusion. Give a p-value for this hypothesis test and interpret this p-value.

(i) n 35, s 3.35, x 10.83

s 95% confidence interval x 1.96

n3.35

95% confidence interval 10.83 1.9635

10.83 1.11

[9.72, 11.94]

This means hourly wage ( ) for workers in the capital city lies in the range €9.72 to €11.94with 95% certainty.This range includes the mean hourly rate for the country (€10).


0


Null Hypotheses H : The average hourly wage for a worker in the capSt

itep 1. State the null and alternative hyp

al is the same as that of a worker

othese

s.

1

in the rest of the country . €10. Alternative Hypothesis H : The average hourly wage for a worker in the capital is not the same as that of a worker in the rest of the country . €10.

0

x 10.83 10

Step 2. Convert the observed results i

Z =

nto z units.

=

Calculate the test statis

= 1.466s 3.35n 35

tic .



25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject

1.466 is in the Fail to Reject Region

0 0


66 is in the fail to reject region. We can con


clude that there is evi

ejec

den

t H .

ce to suggest that, the hourly wage for workers in the capital is the same as the rest of the country.


The probability of getting a value > 1.466 is got from the tables is 1 0.9286 0.0714The probability of getting a value < 1.466

Step 5. p value in a Two Tailed

is also 0.0714.The p-value is t

Test.

he s um of these two probabilities 2(0.0714) 0.1428This p-value is greater than 0.05. So this is greater evidence for failing to reject the null hypothesis.


A machine filling bottles of natural mineral water is set to deliver 0.725 litres with a standard deviation of 0.01 litres. A sample of 50 bottles is checked and the mean quantity is found to be 0.721 litres.A the 5% level of siginificance, investigate if there is evidence to suggest that the mean of this sample is different from the expected mean of 0.725 litres?

Question 2

A machine filling bottles of natural mineral water is set to deliver 0.725 litres with a standard deviation of 0.01 litres. A sample of 50 bottles is checked and the mean quantity is found to be 0.721 litres.A the 5% level of siginificance, investigate if there is evidence to suggest that the mean of this sample is different from the expected mean of 0.725 litres?

0Null Hypotheses H : The mean volume delivered is the same as the expected volume.

STEP 1. State the null and a

0.725 li

lternative h

tres Alterna

ypothe

tive

ses.

Hypo

1thesis H : The mean volume delivered is not the same as the expected volume. 0.725 liters

0

x 0.721 0.725

Step 2. Convert the observed results into z units

Z = =

. Calculate the test statist

= 2.83 s 0.01n 50

ic .



25% 25%

Reject H0Reject H0

-1.96 1.96


Fail to Reject


0 0

We reject the null hypotheses as 2.83 is inStep 4. Reject H if Z is in the critical regi

the reject region. We can conclude that there


is evidence to suggest

ejec

th

t H .

at, the mean volume delivered is not the same as the expected volume.

We can conclude that there is evidence to suggest that the mean is different from the expected mean


Date post:	22-Dec-2015
Category:	Documents
Upload:	todd-ross
View:	219 times
Download:	0 times

Night 1. INFERENTIAL STATISTICS: USING THE SAMPLE STATISTICS TO INFER (TO) POPULATION PARAMETERS....

Documents