Statistics - haesemathematics.com€¦ · A local paper advertised for volunteers to test the...

7

Contents:

StatisticsStatistics

A

B

C

D

E

F

G

H

I

J

Key statistical concepts

Describing data

Normal distributions

The standard normal distribution

Finding quantiles ( -values)

Investigating properties of normal

distributions

Distribution of sample means

Hypothesis testing for a mean

Confidence intervals for means

Review

k

SA_12STU-2magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\219SA12STU-2_07.CDR Thursday, 2 November 2006 3:10:48 PM PETERDELL

DISCUSSION SAMPLING

Words that are commonly used in Statistics:

² Population A collection of individuals about which we want to drawconclusions.

² Census The collection of information from the whole population.

² Sample A selection of information from a subset of the population.

² Data (singular datum) Information about individuals in a population.

² Parameter A numerical quantity measuring some aspect of a population.

² Statistic A quantity calculated from data gathered from a sample.It is usually used to estimate a population parameter.

² Distribution The pattern of variation of data.

A population generally consists of a large number of individuals. Because of expense and

time factors it is often only practical to select a sample rather than use the whole population.

A random sample is a sample where every individual has the same chance of being selected.

A sampling technique is biased if it tends to systematically select members of the population

with certain properties and not select those that do not have these properties. In other words

it favours some individuals above others.

INTRODUCTION

KEY STATISTICAL CONCEPTSA

In the following scenarios, can you suggest a likely population?

Can you think of any reasons the sampling techniques might be biased?

People in the local shopping centre on Saturday morning were askedhow many computers they have in their household.

After a program likely to be watched by older people, a televisionstation asked viewers to vote on the use of hand-held phones in cars.

A local paper advertised for volunteers to test the usefulness of fish oilin a diet.

²

²

²

The word was introduced into the English language by theScottish politician ( – ). He borrowed itfrom Germany where, as he put it, it meant,

“”.

The meaning he wished to give to the word was an

“

.”

You can still recognise the word “state” in statistics.

statistics

Sir John Sinclair 1754 1835

an inquiry for the purpose of ascertaining the political

strength of a country

inquiry into the state of a country, for the purpose of

ascertaining the quantum of happiness enjoyed by its

inhabitants, and the means of future improvement

RANDOM SAMPLES

220 STATISTICS (Chapter 7)


0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


Many sampling techniques have been developed to avoid bias. In this book it will be assumed

that any sample is a random, unbiased sample.

Descriptive statistics are concerned with collecting, summarising and describing the

characteristics of data.

With descriptive statistics we are only concerned with the data collected and make no effort

to generalise it to any other data, such as for the population.

In inferential statistics we select a random sample and we use the information from it

to make generalisations about the population from which the sample was taken.

Recall that:

a parameter is a numerical characteristic of a population and

a statistic is a numerical characteristic of a sample.

For example, when examining the mean age of people in retirement villages throughout

Australia, the mean age found would be a parameter. If we took a random sample of 300people from the population of all retirement village persons, then the mean age would be a

statistic.

Note:P

S

arameter

opulation

ample

tatistics

a What is the population size?

b What is the sample size?

c What population parameter is of interest to the business?

d What statistic is being used to estimate the parameter?

a The population is the number of blank CDs to be purchased and its size is

50 000.

b The sample size is 600:

c The population parameter being considered is the percentage of CDs which

are defective.

d The statistic being used is the percentage of CDs which are defective in

the sample. As 1:5% of 600 = 9, the business would make the purchase if

9 or less CDs in the sample were found to be defective.

A business is considering purchasing blank CDs to make CDs of their new textbooks. It will make the purchase if no more than of the CDs are defective.Because of the expense and time factors in testing all CDs the business decidesto test a random sample of for defects. They will then use the results of this sampleto estimate the percentage of defectives for the population to be purchased.

500001 5%50000

600

�

�:

Example 1

DESCRIPTIVE AND INFERENTIAL STATISTICS

EXAMPLES OF PARAMETERS AND STATISTICS

STATISTICS (Chapter 7) 221


0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


In this course the key application is to examine a random sample in order to make appropriate

statements or inferences about the population.

Generally speaking there are five steps to address in any inferential problem. They are:

THE PROCEDURE USED IN AN INFERENTIAL PROBLEM

Step 1: State the population we are interested in examining.

Step 2: Collect data from a random sample of sufficient size from the population.

Note: What is meant by sufficient size is covered in a later chapter.

Step 3: Examine the relevant information from the sample.

Step 4: Use the results of the sample analysis to make an inference about the

population.

Step 5: Give a measure of the reliability of the inference made.

For the CD purchase in Example 1 list the procedural steps for the inferential

problem.

Step 1: The population consists of all 50 000 CDs.

Step 2: To avoid unnecessary costs and wasting time we must first decide on the

sample size. 600 has been decided upon, so we collect 600 data values

at random. We record only whether the CD is defective or not.

Step 3: Find the percentage of defective CDs in the sample.

Step 4: The inference will be to provide an estimate of the percentage of defective

CDs for the whole population. For example, if 12 CDs are defective in

the sample our inference would be that approximately 12600 = 2% would

be defective in the population.

Step 5: The estimate from the sample is not likely to be equal to the exact

value for the population. Some indication of the possible error for the

estimate should therefore be given.

An example of such a statement as in Step 5 is:

If we had many shipments of 50 000 CDs and in each we found that 12 in a sample of

600 were defective, then in 95% of these shipments there would be between 440 and

1560 defective CDs.

This type of statement is usually condensed to:

We are 95% confident that about 440 to 1560 CDs are defective.

The main thrusts of this course are to:

² determine confidence intervals in which a certain population parameter should lie at

a particular level of confidence (commonly 90%, 95%, 99%)

² devise and use particular tests of hypotheses about population means

² determine what sample sizes should return a particular level of confidence in given

situations.

Example 2



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


a In this city, bananas are cheaper than oranges.

If you buy a kilogram of each of the three different types of fruit from the onestore, you pay the same total amounts at stores A and D.

Of the four stores, the store with the most expensive apples also had the mostexpensive oranges and bananas.

In general, store C has themost expensive fruit.

Of the four stores, store Chas the most expensivefruit. (Careful! What is thepopulation and what is thesample?)

b

c

d

e

1 A new drug called Cobrasyl, a derivative of cobra

venom, is to be approved for the treatment of high

blood pressure in humans.

A research team treats 127 high blood pressure

patients with the drug and in 119 cases it reduces

their blood pressure to an acceptable level.

a What is the sample of interest?

b What is the population of interest?

2 In 2006, 800 computer workers throughout Australia were surveyed and asked a question.

The question was: “Is your main interest in developing software or in using already

developed software?” 83% said that developing software was their main interest.

a What is the population of interest?

b What is the parameter of interest?

c What statistic is used to estimate the parameter?

3

a

b What is the parameter of interest?

c

4 Last December Tina visited four super-

markets A, B, C and D on the same day.

She recorded the price per kilogram of

various fruits in the table opposite:

Determine whether the following state-

ments are descriptive or inferential:

Store Oranges Apples Bananas

A $2:35 $2:15 $1:70

B $2:45 $2:55 $2:00

C $2:50 $2:60 $2:10

D $2:25 $2:05 $1:90

EXERCISE 7A

What is the population the processor isinterested in?

A South Australian processor of seafood needs toestimate the average weight of a prawn in acatch. A sample of prawns was selected andfound to have an average weight of grams.

35253 8:

What statistic does the processor use toestimate the parameter?



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


This section will review the main concepts from Year 11 so that students will reacquaint

themselves with the terminology used in statistics.

A variable is a quantity that can have different values for different individuals in the

population.

Since variables are sometimes used to describe random processes, they are often called

random variables.

Variables are usually denoted by capital letters such as X. Individual values, called observa-

tions or outcomes, are denoted by lower case letters such as x.

We shall deal with two types of variables: categorical and quantitative.

A categorical or nominal variable can be described by a quality or characteristic that

is essentially non-numeric. Individuals are described by different categories.

DESCRIBING DATAB

Examples of categorical data are:

Variable Possible values

² X is the gender of a person x = male or female

² C is the type of motor car c = Holden, Ford, Toyota

² M is the membership of political party m = ALP, LIB, DEM

A quantitative or numerical variable takes numerical values.

There are essentially two different types of numerical variable.

A numerical discrete variable takes discrete number values only.

It is often a result of counting.

Examples of discrete variables are:


² X is the number of people in a household x = 1, 2, 3, 4 ::::::

² T is the mark out of 10 for a test t = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Examples of continuous variables are:


² W is the weight of newborn babies w is likely to be in the interval from 0:5 kg

to 5 kg.

² X is the amount of water in a 500 litre

rain water tank

x is any volume between 0 and 500 litres.

A can take any numerical value in an interval.A continuous variable is often a result of measuring.

numerical continuous variable



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


In a sample of size n, the sample standard deviation, usually denoted by s, is:

s =

s(x1 ¡ x)2 + (x2 ¡ x)2 + :::::: + (xn ¡ x)2

n ¡ 1=

sP(xi ¡ x)2

n¡ 1

In a population of size n, the population standard deviation, usually denoted

by the Greek letter ¾ (sigma), is:

¾ =

s(x1 ¡ ¹)2 + (x2 ¡ ¹)2 + :::::: + (xn ¡ ¹)2

n=

sP(xi ¡ ¹)2

n

Since continuous variables take on values in intervals, they are also called interval variables.

The essential difference between a categorical and a quantitative variable is that we can do

arithmetic with quantitative variables, but not with categorical variables.

In this book we are mainly concerned with the mean and the standard deviation.

The mean of a sample of n numbers,

x1, x2, ......... , xn is: x =x1 + x2 + ::::::: + xn

n=

1

n

nPi=1

xi

The Greek letterP

(sigma) is used to denote the summation of numbers,

sonP

i=1xi = x1 + x2+ ::::::: +xn (read “the sum of all xi for i = 1 to n”).

The endpoints of the summation, i = 1 to n are sometimes omitted, so the mean can be

written as 1n

Pxi or even 1

n

Px.

The mean of a population is usually denoted by the Greek letter ¹ (mu), so ¹ =1

n

Px.

We can get a much clearer picture of a data set if, in addition to having a measure for the

centre, we also have an indication of how the data is spread.

For example, the mean weight of oranges from a particular orchard and the mean weight of

salt bagged by a machine may both be 500 grams, but the variation in the weights of oranges

is likely to be much greater than that of bags of salt. The data for oranges will therefore have

a greater spread.

The most commonly used measure of spread about the mean is the standard deviation.

The standard deviation of a sample is a little different from the standard deviation of a

population.

THE MEAN AND STANDARD DEVIATION (REVIEW)

The reason for this difference is rather technical and, at this stage we do not attempt to explain

the difference.

Statisticians know that the value of s, as calculated by the above formula, gives an unbiassed

estimate of the population standard deviation ¾.

Notice that for large n, the values of s and ¾ are virtually the same.



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


The mean and standard deviation can also be calculated from frequency tables.

The frequency fi of a quantity xi is the number of times it occurs.

For a population of size n, the formulae for the mean and standard deviation become:

¹ =f1x1 + f2x2 + f3x3 + :::::: + fkxk

n

and ¾ =

r(x1 ¡ ¹)2f1 + (x2 ¡ ¹)2f2 + :::::: + (xk ¡ ¹)2fk

n

Notice that ¹ =

µf1n

¶x1 +

µf2n

¶x2 +

µf3n

¶x3 + :::::: +

µfkn

¶xk.

fin

is the proportion of xi in the population. For large values of n, the experimental

probability pi of randomly selecting xi from the population is taken to be pi =fin

.

So, using pi =fin

, ¹ = p1x1 + p2x2 + p3x3 + :::::: + pkxk =

Xpixi :

Similarly for the population standard deviation:

¾ =

sµf1n

¶(x1 ¡ ¹)2 +

µf2n

¶(x2 ¡ ¹)2 + :::::: +

µfkn

¶(xk ¡ ¹)2

which leads to ¾ =

qXpi(xi ¡ ¹)

2.

The probability table is: xi 0 1 2 3 4 5

pi 0:00 0:23 0:38 0:21 0:13 0:05

Now ¹ =X

pixi

= 0:23 £ 1 + 0:38 £ 2 + 0:21 £ 3 + 0:13 £ 4 + 0:05 £ 5

= 2:39

i.e., in the long run, the average number purchased per customer is 2:39

Also, ¾ =qX

pi(xi ¡ ¹)2

=q

0:23 £ (1 ¡ 2:39)2 + 0:38 £ (2 ¡ 2:39)2 + :::: + 0:05 £ (5 ¡ 2:39)2

+ 1:12

A magazine store claims of its customers purchase one magazine, purchasetwo, purchase three, purchase four, and purchase five. Find the meanand the standard deviation of , the number of magazines sold to a customer.

23% 38%21% 13% 5%

X

Example 3



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


‘Cheap Car Insurance’ insures used cars valued at $6000 under these conditions.

A $6000 will be paid to the owner for total loss

B for damage between $3000 and $5999, $3500 will be paid

C for damage between $1500 and $2999, $1000 will be paid

D for damage less than $1500, nothing will be paid.

From statistical information the insurance company knows that in any year the

probabilities of A, B, C and D are 0:03, 0:12, 0:35 and 0:50 respectively.

If the company wishes to receive $80 more than its expected payout on each

policy, what should it charge for the policy?

Let X be the random variable of payouts, so the probability table is:

xi 0 1000 3500 6000

pi 0:50 0:35 0:12 0:03

The expected payout is the mean, ¹, and

¹ =P

pixi

= (0:50) £ 0 + (0:35) £ 1000 + (0:12) £ 3500 + (0:03) £ 6000

= 950

The company expects to pay out $950 on average in the long run, so it should

charge $950 + $80 = $1030:

1

xi 0 1 2 3 4 5 > 5

P (xi) 0:54 0:26 0:15 0:03 0:01 0:01 0:00

a What is the mean number of deaths per dozen crayfish?

b Find ¾, the standard deviation for the probability distribution.

2

Example 4

EXERCISE 7B

Australian crayfish is exported to Asian markets. Thebuyers are prepared to pay high prices when the crayfisharrive still alive. If is the number of deaths per dozencrayfish, the probability function for is given by:

XX

A random variable X has probability function given by

P (x) = k(0:4)x(0:6)3¡x for x = 0, 1, 2, 3.

a Find P (x) for x = 0, 1, 2 and 3 and hence find k.

b Find the mean and standard deviation for the distribution.

3 An insurance policy covers a $20 000 sapphire ring against theft and loss. If it is stolen

the insurance company will pay the policy owner in full. If it is lost they will pay the

owner $8000. From past experience the insurance company knows that the probability

of theft is 0:0025 and of being lost is 0:03. How much should the company charge to

cover the ring if they want a $100 expected return?



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


NORMAL DISTRIBUTIONSC

DISCUSSION THE EFFECT OF RANDOM FACTORS

4 Use technology to find the mean and standard deviation of the two samples, A and B,

of weights given in grams.

A 498:8 500:2 500:4 499:9 500:4 500:6 498:9 498:2 500:1 501:9500:8 498:6 499:7 498:6 499:0 498:8 499:1 500:7 500:7 501:3501:1 501:5 499:0 499:7 498:4 501:1 500:1 499:9 500:9 499:2

B 545:5 543:4 399:8 511:3 616:3 496:7 337:8 650:2 426:3 522:2664:0 415:1 416:0 425:4 419:9 503:7 427:8 474:2 459:9 390:5428:5 451:9 590:1 613:5 402:3 318:3 478:1 502:2 626:4 435:7

Which of the samples is the weights of bags of salt, and which is the weights of oranges?

5 Test marks out of 10 are recorded in the following frequency table:

Mark 0 1 2 3 4 5 6 7 8 9 10

Frequency 2 1 0 4 5 8 12 15 7 3 5

a Find the mean and standard deviation of these scores.

b Calculate the percentage difference between using the formulae for population

standard deviation and sample standard deviation.

6 Using ¾2 =P

pi(xi ¡ ¹)2 show that ¾2 =P

pix2i ¡ ¹2:

(Hint: ¾2 =P

pi(xi ¡ ¹)2 = p1(x1 ¡ ¹)2 + p2(x2 ¡ ¹)2 + :::::: + pn(xn ¡ ¹)2:

Expand ¾2 and regroup the terms.)

Many quantities reflect the combined effect of a large number of random factors.

For example:

²

²

² Consider at least three factors that affect each of the following:

a the weight of a newly born piglet

b the time to complete an assignment

c the mark achieved in an examination

d the number of goals scored in a netball match.

² For each of the above random variables, suggest why the distribution might be

a symmetric b bell shaped.

The next investigation explores the distribution of a quantity that is the combined result of

different factors.

The yield of a wheat plant is the combined result of many unpredictable factors suchas genes, rainfall, sunshine, and its position in the field where it was seeded.

The weight of a packet of sultanas is the sum of the weights of each individualsultana, and it is unlikely a packet labelled as kg will weigh exactly kg.1 1



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


INVESTIGATION 1 SOME PROPERTIES OF A NORMAL DISTRIBUTION

Stage What is happening Time

1 Cross the road in front of the school up to 1 minute

2 Walk to the shopping centre 5 § 2 minutes

3 Walk through the shopping centre 3 § 2 minutes

4 Cross a road up to 1 minute

5 Buy a loaf of bread up to 2 minutes

6 Talk with a friend up to 2 minutes

7 Walk the remaining distance home 2 § 1 minutes

Question: According to the table, what is the longest time it may take Les to walk

home? What is the shortest time?

If Les wanted to study the distribution of the time it takes to walk home, he could keep a

daily record, but the amount of data collected would be very small.

Les could also use the information given in the table and use a spreadsheet or a calculator

to simulate the time it takes to walk home.

The following instructions are set up for a spreadsheet, but the procedure will also work

on a calculator.

1 Open the spreadsheet “Normal distribution”.

A spreadsheet with the following headings will appear.

2 In each of the cells A2 to G2, under the headings ‘Stage 1’ to ‘Stage 7’, type in the

formulae shown in the table. Do not forget to start each formula with an = sign.

Note: rand() calculates a random number between 0 and 1.

Question: What does 5 + (4*rand( ) ¡ 2) calculate?

3 In cell N2, below the heading ‘Total time’, type in the formula =sum(A2:M2)

Question: What does this formula calculate?

4 Drag the formulae in cells A2 to N2 down to fill all cells A251 to N251. Pressing

the F9 function key will produce another random sample.

Consider the time it takes Les to walk home from school. We have brokenthis into the following stages with the time it takes to complete each stage:

What to do:SPREADSHEET

The numbers in cell P2 under the heading ‘Mean’, and in cell Q2 under the heading

‘Standard Deviation’, are the mean and standard deviation of the numbers in cells N2to N251.

The number in cell R2 under the heading ‘No. within 1 st. dev.’ gives the number of



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


values within 1 standard deviation of the mean. For example, if the mean x = 12:96and the standard deviation s = 1:82, then this cell gives the number of values that

lie between x ¡ s = 11:14 and x + s = 14:78 . Similarly, the numbers in cells

S2 and T2 give the number of values within 2 and 3 standard deviations of the mean

respectively.

If you are having difficulty setting up this spreadsheet, click on the tag ‘Normal 2’ to

open a finished version.

5 Calculate the proportion of data values within each interval. For example, if there are

169 values within 1 standard deviation of the mean, the proportion of values in the

interval = 169250 = 0:676 .

6 Copy and fill in the following table for 5 different samples. The entries of the first

line may not agree with your values.

Sample Mean Stdev x¡ s to x + s x¡ 2s to x + 2s x¡ 3s to x + 3sno. x s Count Propn. Count Propn. Count Propn.

1 12:96 1:82 169 0:676

2

3

4

5

What do you notice about the proportions of data in each of the intervals?

In the following we change the value of the factors and then add more factors.

7 Change the formulae in cells A2 to G2 as shown in the table.

8 Repeat steps 4 to 6.

9 Add the following formulae in cells H2 to M2:

10 Repeat steps 4 to 6.

The graph that appears is thehistogram of data in cells N to N .2 251



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


From Investigation 1 you should have discovered that changing the number and values of

factors may change the mean and standard deviation, but leaves the following unchanged:

Note:

A smooth curve drawn through

the midpoints of each column

of the histogram would ideally

look like the graph displayed.

Note the points of inflection at

¹¡ ¾ and ¹ + ¾.

The above information is typical of a family of normal distributions. Curves with this shape

are known as normal curves. Because of their characteristic shape, they are also called

bell-shaped curves.

Variables which are the combined result of many random factors are often approximately

normal.

The normal variable X with mean ¹ and standard deviation ¾ is denoted by X » N(¹, ¾2).

34% 34%

13.5% 13.5%

2.35%0.15% 0.15%2.35%

��

¹¡¹ ¾ +¹ ¾

concave

convex convex

point of inflection point of inflection

² The shape of the histogram is symmetric about the mean.

² Approximately 68% of the data lies between 1 standard deviation below the mean

and 1 standard deviation above the mean.

² Approximately 95% of the data lies between 2 standard deviations below and 2standard deviations above the mean.

² Approximately 99.7% of the data lies between 3 standard deviations below and 3standard deviations above the mean.

It is a rare event for an outcome to be outside the standard deviation range betweenand . In a sample of , you would only expect about cases.¡3 3 1000 3¾ ¾

For any distribution of data, whether it is a normal distribution or not, the function whose

smooth curve approximates the histogram of the data is called a probability density function

or pdf.

If the variable X is normally distributed, N(¹, ¾2), the probability density function is

f(x) =1

¾p

2¼e¡

12 (

x¡¹¾

)2 .

CONTINUOUS PROBABILITY DENSITY FUNCTIONS



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


Probability density functions f have the following properties:

² f(x) > 0 for all values of x.

² The area between the graph of f and the horizontal axis is 1, since the total of all

probabilities is 1.

² The proportion of outcomes of the variable X between the values a and b is the

area between the graph of f and the horizontal axis for a 6 x 6 b.

Notice that: Pr(a 6 X 6 b) =

Z b

a

f(x) dx

For a continuous variable X, the probability X is exactly equal to a point a is zero.

For example, the probability an egg will weigh exactly 72:9 g is zero.

If you were to weigh an egg on scales that weigh to the nearest 0:1 g, a weight of 72:9 g

means the weight lies somewhere between 72:85 and 72:95 grams.

Presumably an egg has to weigh something, and it could be 72:9 grams, but you will never

know. No matter how accurate your scales are, you can only ever know the weight of an egg

within a range.

So, for a continuous variable we can only talk about the probability an event lies in an

interval.

Notice that:

if X is continuous, Pr(a 6 X 6 b), Pr(a < X 6 b), Pr(a 6 X < b)and Pr(a < X < b) all have the same value. Why?

This would not be correct if X was discrete.

87 95 103 111��

��

��

34% 34%

13.5%

��

The chest measurements of 18 year old male footballers are normally distributed with

a mean of 95 cm and a standard deviation of 8 cm.

a Find the percentage of randomly chosen footballers with chest measurements

between: i 87 cm and 103 cm ii 103 cm and 111 cm

b Find the probability of randomly choosing a footballer with a chest measurement

between 87 cm and 111 cm.

a i We need the percentage between

¹¡ ¾ and ¹ + ¾. This is 68%.

ii We need the percentage between

¹ + ¾ and ¹ + 2¾. This is 13:5%:

b The percentage between ¹¡ ¾ and

¹ + 2¾ is 68% + 13:5% = 81:5%:

So the probability is 0:815

For the distribution of chest measurements, the meancm and the standard deviation cm.¹ ¾� � � � � �=95 =8

Example 5



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


1 What is the probability that a normally distributed value lies between:

a 1¾ below the mean and 1¾ above the mean

b the mean and the value 1¾ above the mean

c the mean and the value 2¾ below the mean

d the mean and the value 3¾ above the mean?

2 Suppose the heights of 16 year old male students are normally distributed with a mean

of 170 cm and a standard deviation of 8 cm. Find the percentage of male students whose

height is:

a between 162 cm and 170 cm b between 170 cm and 186 cm.

Find the probability that a student from this group has a height:

c between 178 cm and 186 cm d less than 162 cm

e less than 154 cm f greater than 162 cm.

3 The time T minutes it takes Charlotte to go to work is normally distributed with mean

50 minutes and standard deviation of 5 minutes. Every morning Charlotte leaves for

work at 8 am.

a If work starts at 9 am, what is the probability Charlotte will be late for work?

b If Charlotte works 250 days a year, how many times can she expect to be late?

4 Explain why each of the following variables might be normally distributed:

a the chest size of 18 year old Australian males

b the length of adult female sharks

c the protein content of each kilogram of corn grown in the same field.

5 A farmer has a flock of 237 crossbred lambs. The mean weight of the flock is 35 kg

with a standard deviation of 2 kg.

a Explain why the weights of the lambs might be normally distributed.

b If lambs between the weights of 33 to 39 kg are suitable for export, how many

lambs in this flock could the farmer expect to be able to export?

6 The weights of hens’ eggs are normally distributed with mean 65 grams and standard

deviation 6 grams.

a Determine the probability that a randomly selected egg has weight

i greater than 53 g ii less than 71 g iii between 59 g and 77 g.

b In one week the hens lay 1286 eggs. How many of these eggs are expected to be

i greater than 53 g ii less than 71 g iii between 59 g and 77 g.

7 The marks for a geography examination are normally distributed with mean 65 and

standard deviation 11.

a A geography student is chosen at random. Determine the probability that the student

scored i less than 76 marks ii between 43 and 76 marks.

b

c If 2582 students sit for the examination, how many of them would be expected to

score less than 32 marks?

EXERCISE 7C

If the top of students receive an A grade, what was the minimum markfor an A?

16%



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


For each value of ¹ and ¾ there is a different normal distribution N(¹, ¾2).

As illustrated by Investigation 1, all normal distributions have one important property in

common: the probability of an event occurring depends only on the number of standard

deviations the event is from the mean.

If x is an observation from a normal distribution with mean ¹ and standard deviation ¾,

the z-score of x is the number of standard deviations x is from the mean.

The diagram shows how

the z-score is related to

a normal curve.

�� x x x

THE STANDARD NORMAL DISTRIBUTIOND

34% 34%

13.5% 13.5%

2.35%0.15% 0.15%2.35%

Normal distribution curve

��

��

actual score

z-score

8 The weights of Jason’s oranges are normally distributed. 84% of the crop weigh more

than 152 grams and 16% weigh more than 200 grams.

a Find ¹ and ¾ for the crop

b What proportion of the oranges weigh between 152 grams and 224 grams?

9 The heights of 13 year old boys are normally distributed. 97:5% of them are above 131cm and 2:5% are above 179 cm.

a Find ¹ and ¾ for the height distribution

b A 13-year old boy is randomly chosen. What is the probability that his height lies

between 143 cm and 191 cm?

10 Using the same set of axes, quickly sketch the graphs of the density functions for each

of the following distributions:

a N(0, 32) b N(0, (0:5)2) c N(¡5, 12) d N(3, 0:25).

11 Each of the following is a graph of a normal distribution with different vertical scales:

A B C

a Write down the mean ¹ for each of these distributions.

b Which of the distributions has standard deviation

i ¾ = 0:1 ii ¾ = 1 iii ¾ = 10 ?

c Which of the distributions has the largest spread?



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


z-scores are particularly useful when comparing two measurements made using different ¹and ¾. But be careful! These comparisons will only be reasonable if both measurements are

approximately normal.

a i Sketch the graphs of the two distributions using the same scale for the

z-scores from ¡3 to +3.

ii Put the actual times/distances below each of the z-scores on the graphs.

iii Calculate the z-scores for John and Anne, and mark these on the graphs.

iv Shade the area under the respective graphs to represent performances that

were better than those of John and Anne.

b Of all the students who participated in these two events, what proportion would

have performed better than i John ii Anne?

c If 1000 students had participated in each of these two events, how many would

have performed better than i John ii Anne?

d Of the father and daughter, who had the better result?

a i/ii/iv

iii John’s time was 3:2 ¡ 3:4 = ¡0:2 minutes from the mean.

Since the standard deviation is 0:2 minutes, John ran the 800 metres in a

time of 1 standard deviation less than the mean.

The z-score of John’s performance is ¡1:

The distance Anne jumped was 5:1 ¡ 4:3 = 0:8 m above the mean.

Since the standard deviation is 0:4 metres, Anne jumped a distance of 2standard deviations above the mean.

The z-score of Anne’s performance is +2.

Example 6

The local school has kept records of all its athletics competitions. It was found thatthe time, in minutes, to run the men’s metres was normally distributed asN , . The women’s long jump, in metres, was normally distributed asN , . In John won the metre race with a time of minutes. In

his daughter Anne came second in the long jump with a distance of m.

800(3 4 (0 2) )(4 3 (0 4) ) 1980 800 3 2

2006 5 1

: :: : :

:

2

2

34% 34%

13.5% 13.5%

2.35%0.15% 0.15%2.35%

John’s time

�� actual time (min)

z-score��

� ��

34% 34%

13.5% 13.5%

2.35%0.15% 0.15%2.35%

Anne’s distance

��

actual distance (m)

z-score

better than John

better than Anne



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


b i The proportion less than ¹¡ ¾ is 0:16, so 16% of all participants

performed better than John.

ii The proportion greater than ¹ + 2¾ is 0:025, so only 2:5% of all

participants performed better than Anne.

c i Of 1000 participants, 16% of 1000 = 160 were better than John.

ii 2:5% of 1000 = 25 were better than Anne; one of these happened

to be competing on the same day as Anne.

d Anne’s long jump was more outstanding than her father’s 800 metre race.

1 In a year 12 class, the marks for a Geography test marked out of 50 were normally

distributed with mean of 34 and standard deviation of 6. The marks for an English essay

out of 20 were normally distributed with a mean of 12 and standard deviation of 1:5 .

Val received a mark of 40 for her Geography and 15 for her English essay.

a Sketch the graphs of the two distributions below one another using the same scale

for the z-scores from ¡3 to +3.

Put the actual marks below each z-score on the graph.

b For which of the two subjects did Val receive the higher % mark?

c Calculate the z-score for each of Val’s results.

i Mark these z-scores on the two graphs.

ii Shade the region on the two graphs of scores which were better than Val’s.

d What proportion of the students performed better than Val in Geography, and what

proportion performed better than Val in English?

e If there were 32 students in the class, how many performed better than Val in

Geography and how many in English?

f In which of these two assessments did Val perform better?

2 Suppose that the weight W of bags of sugar filled by a machine are normally distributed

with mean ¹ = 504 grams and standard deviation ¾ = 2 grams.

A quality controller rejects any bags of sugar with weight less than 500 grams.

Across town, the weight A of bags of apples filled by an assistant in a green grocer shop

is normally distributed with mean weight 5 kilograms and standard deviation 500 grams.

Bags weighing less than 412 kg are rejected by a quality controller.

a Sketch the graphs of the two distributions below one another using the same scale

for the z-scores from ¡3 to +3.

Put the actual weights below each z-score on the graph.

b Calculate the z-score for each of the two quality controls, and shade in the regions

corresponding to the weights of bags that are rejected.

c Which of the two quality controllers is the more stringent, i.e., rejects the larger

proportion of bags?

EXERCISE 7D.1



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


b Hua’s mark is ¡1:5 standard

deviations from the mean.

Since the standard deviation is

12, this is 12 £ (¡1:5) = ¡18marks from the mean.

Since the mean is 63, Hua’s

mark is 63 + (¡18) = 45.

3 Suppose the distribution of the diameter (in cm) of oranges from a tree is N(10, 22).

a Sketch a graph of the distribution that displays both the actual diameters as well as

the z-score along the horizontal axis.

b Find the z-score for each of the following diameters:

i 12 cm ii 9 cm iii 13 cm

c Oranges are to be dumped if their diameters have a z-score of less than ¡2.

What is the diameter of oranges that are to be dumped?

d If there are 120 oranges on the tree, how many will be dumped?

4 The volume of milk cartons filled by a machine is normally distributed with mean 504mL and standard deviation of 1:5 mL.

a What is the z-score of a carton containing 506 mL of milk?

b What is the volume of milk in a carton with a z-score of ¡1:5?

Hua’s mark

��

��

��

��

�

�

�

��

��

�

��

�

��actual mark

z-score

If x is an observation from a normal distribution with mean ¹ and standard deviation ¾, the

z-score of x can be calculated from the formula z =x¡ ¹

¾.

If the variable X is normally distributed with mean ¹ and standard deviation ¾, then

Z =X ¡ ¹

¾is called the standard normal distribution.

The variable Z is the number of standard deviations X is from the mean.

Notice that, if x = ¹ then z = 0 and if x = ¹ + ¾ then z = 1.

Suppose examination scores are normally distributed with mean mark ¹ = 63 and

standard deviation of ¾ = 12 marks.

a What is the z-score for a mark of 80?

b If Hua’s z-score is ¡1:5, what is Hua’s actual score?

a A mark of 80 is 80 ¡ 63 = 17above the mean.

Since the standard deviation

is 12, this is 1712 = 1:42 standard

deviations above the mean.

So, the z-score is 1:42

score of 80

��

��

��

��

�

�

�

��

��

�

��

�

��actual mark

z-score

Example 7

Hence, the mean of is and the standard deviation of is .Z Z0 1



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


When working with normal distributions, you are advised to sketch a graph of the normal

distribution and shade in the areas of interest.

Use technology to illustrate and calculate:

a Pr(¡0:41 6 Z 6 0:67) b Pr(Z 6 1:5) c Pr(Z > 0:84)

a For a TI, Pr(a 6 Z 6 b)

can be calculated using normalcdf(a, b, 0, 1)

Pr(¡0:41 6 Z 6 0:67)

= normalcdf (¡0:41, 0:67, 0, 1)

+ 0:408

USING TECHNOLOGY TO FIND PROBABILITIES

TI

C

Example 9

�0.41

0

0.67

The probability Z lies between ¡2 and 1 is the proportion of observations that lie

between 2 standard deviations to the left of the mean and 1 standard deviation to

the right of the mean. This is about 0:815 .

1 Subject Emma’s score ¹ ¾

English 12 10 1:1

Chinese 27 20 3:0

Geography 84 55 18

Biology 34 25 10

Mathematics 84 50 15

a Find the z-score for each of

Emma’s subjects.

b Arrange Emma’s subjects from

‘best’ to ‘worst’ in terms of the z-scores.

2 Calculate the following probabilities. In each case sketch the graph of the Z-distribution

shading in the region of interest.

a Pr(¡1 < Z < 1) b Pr(¡1 < Z < 3) c Pr(¡1 < Z < 0)

d Pr(Z < 2) e Pr(¡1 < Z) f Pr(Z > 1)

EXERCISE 7D.2

34% 34%

13.5%

��

z

Find the probability that the standard normal distribution Z lies between ¡2 and 1.

The graph of the Z-distribution is shown:

Example 8

The table shows Emma’s midyear examresults. The exam results for each subject arenormally distributed with mean andstandard deviation shown in the table.

¹¾

So far we have only used integer -scores to calculate probabilities. Byrefining the methods used in we can calculate probabilities forother -scores. To see how to use your calculator to do this, click on the icon.

z

zInvestigation 1



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


b Pr(Z 6 1:5)

= normalcdf(¡E99, 1:5, 0, 1)

+ 0:933

Note: ¡E99 is the largest negative

number on a calculator.

c Pr(Z > 0:84)

= normalcdf(0:84, E99, 0, 1)

+ 0:200

Note: E99 is the largest positive

number on a calculator.

1 If Z is the standard normal distribution, find the following probabilities.

In each case sketch the regions.

a Pr(¡0:86 6 Z 6 0:32) b Pr(¡2:3 6 Z 6 1:5) c Pr(Z 6 1:2)

d Pr(Z 6 ¡0:53) e Pr(Z > 1:3) f Pr(Z > ¡1:4)

g Pr(Z > 4)

With modern technology we can calculate probabilities for normal

distributions which have not been standardised. Click on the icon to

see how this is done.

1.50

0.840

EXERCISE 7D.3

TI

C

If X is N(10, 2:32), find these probabilities:

a Pr(8 6 X 6 11) b Pr(X 6 12) c Pr(X > 9). Illustrate.

a Pr(8 6 X 6 11)

= normalcdf(8, 11, 10, 2:3)

+ 0:476

b Pr(X 6 12)

= normalcdf(¡E99, 12, 10, 2:3)

+ 0:808

c Pr(X > 9)

= normalcdf(9, E99, 10, 2:3)

+ 0:668

1210

109

108 11

Example 10

Note:

When ¹ = 0 and ¾ = 1 we can simply use normalcdf (a, b)



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


2 If the random variable X is N(70, 32), find these probabilities:

a Pr(60:6 < X 6 68:4) b Pr(X > 74) c Pr(X 6 68)

3 Suppose the variable X is normally distributed with mean ¹ = 58:3 and standard

deviation ¾ = 8:96 .

a Let the z-score of x = 50:6 be z1 and the z-score of x = 68:9 be z2.

i Calculate z1 and z2. ii Find Pr(z1 6 Z 6 z2)

b Find Pr(50:6 6 X 6 68:9) directly from your calculator.

c Compare the answers to a and b.

4 Suppose X is N(50, 52). Calculate Pr(a < X 6 51) for each of the following values

of a. Give your answers to 5 decimal places.

a a = 45 b a = 35 c a = 25 d a = 15 e a = 0

Compare the answers of a to e with Pr(X 6 51):

5 The height of 18 year old men is normally distributed with mean 182:3 cm and standard

deviation 9:6 cm. Find the probability that a randomly selected 18 year old man is:

a at least 180 cm tall b at most 190 cm tall c between 175 and 185 cm.

6 The weight of hens’ eggs is normally distributed with mean 42:3 g and standard deviation

5:9 g. Find the probability that a randomly selected egg is:

a at most 50 g b at least 45 g c between 35 g and 45 g.

7 The speed of cars passing the supermarket is normally distributed with mean 56:3 kmph

and standard deviation 7:4 kmph. Find the probability that a randomly selected car is

travelling at:

a between 60 and 75 kmph b at most 70 kmph c at least 60 kmph.

8 The lengths of metal bolts produced by a machine are found to be normally distributed

with a mean of 19:8 cm and a standard deviation of 0:3 cm. Find the probability that a

bolt selected at random from the machine will have a length between 19:7 and 20 cm.

9 The IQs of secondary school students from a particular area are believed to be normally

distributed with a mean of 103 and a standard deviation of 15:1. Find the probability

that a student will have an IQ:

a of at least 115 b that is less than 75 c between 95 and 105:

a player was: a at least 175 cm tall b between 170 cm and 190 cm.

If X is the height of a player then X is normally distributed with mean ¹ = 179and standard deviation ¾ = 7:

a We need to find

Pr(X > 175)

= normalcdf(175, E99, 179, 7)

+ 0:716

b We need to find

Pr(170 6 X 6 190)

= normalcdf(170, 190, 179, 7)

+ 0:843

In the heights of SANFL players was found to be normally distributed withmean cm and standard deviation cm. Find the probability that in

1972179 7 1972

Example 11



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


10 The average weekly earnings of the students at a local high school are found to be

approximately normally distributed with a mean of $40 and a standard deviation of $6:What proportion of students would you expect to earn:

a b

11 The lengths of Murray Cod caught in the River Murray are found to be normally

distributed with a mean of 41 cm and a standard deviation of 3:317 cm.

a Find the probability that a cod is at least 50 cm.

b What proportion of cod measure between 40 cm and 50 cm?

c In a sample of 200 cod, how many of them would you expect to be at least 45 cm?

Let X be the random variable of the length in mm of a snail shell.

Suppose that X is normally distributed with mean ¹ = 23:6and standard deviation ¾ = 3:1 mm. A snail farmer wants to

harvest some of his snails, but only those whose shell lengths

are amongst the longest 5%. The problem is to find k such that

Pr(X < k) = 95%.

When finding quantiles we are given a probability and are asked to calculate the corresponding

measurement. This is the inverse of finding probabilities, and we use the inverse normal

function.

Click on the icon to obtain instructions for using your calculator.

For the above example, the TI instruction is

k = invNorm(0:95, 23:6, 3:1) = 28:7

The instruction k = invNorm(0:95) will

assume that the mean ¹ = 0, and the

standard deviation ¾ = 1.

FINDING QUANTILES ( -VALUES)kE

TI

C

If Z has a standard normal distribution, find k if Pr(Z < k) = 0:73

Using a TI,

k = invNorm(0:73, 0, 1)+ 0:613

This means 73% of the values are expected to be less than 0:613

k��

73%

��

Example 12

The number is known as a , and in this case the quantile.k quantile 95%

k��.��

95%

X

between $ and $ per week30 50 at least $ per week?50

Z



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


Let X denote the final examination result, so X » N(62, 132):

Pr(X > k) = 0:8

) Pr(X 6 k) = 0:2

) k = invNorm(0:2, 62, 13)

) k + 51:059

So, the minimum pass mark is 51.

A university professor determines that of this year’s History candidates shouldpass the final examination. The examination results are expected to be normallydistributed with mean and standard deviation . Find the lowest score necessaryto pass the examination.

80%

62 13

Example 13

1 Z has a standard normal distribution. Illustrate with a sketch and find k if:

a Pr(Z 6 k) = 0:81 b Pr(Z 6 k) = 0:58

2 X » N(20, 32). Illustrate with a sketch and find k if:

a Pr(X 6 k) = 0:348 b Pr(X 6 k) = 0:878

c Pr(Z 6 k) = 0:17

c Pr(X 6 k) = 0:5

3 a Show that Pr(¡k 6 Z 6 k) = 2Pr(Z 6 k) ¡ 1:

b If Z is standard normally distributed, find k if:

i Pr(¡k 6 Z 6 k) = 0:238 ii Pr(¡k 6 Z 6 k) = 0:7004

4 The length of a fish species is normally

distributed with mean 35 cm and standard

deviation 8 cm. The fisheries department

has decided that the smallest 10% of the

fish are not to be harvested. What is size

of the smallest fish that can be harvested?

5 The length of screws produced by a machine is normally distributed with mean 75 mm

and standard deviation 0:1 mm. If a screw is too long it is automatically rejected. If 1%of screws are rejected, what is the length of the smallest screw to be rejected?

6 The average score for a Physics test was 46 and the standard deviation of the scores was

15. Assuming that the scores were normally distributed, the teacher decided to award

an A to the top 7% of the students in the class. What is the lowest score that a student

needed in order to achieve an A?

7 The volume of cool drink in a bottle filled by a machine is normally distributed with

mean 503 mL and standard deviation 0:5 mL. 1% of the bottles are rejected because they

are underfilled, and 2% are rejected because they are overfilled; otherwise they are kept

for retail. What range of volumes is in the bottles that are kept?

EXERCISE 7E

k ��

20%

X

We need to findsuch that

k



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


58.2 ��

15%

0.150.1

!z=20 � !x=29#z #x

Note: Z-scores are essential for finding unknown values of ¹ and/or ¾.

8 The arrival times of buses at a depot is normally distributed with standard deviation of

5 minutes. If 10% of the buses arrive before 3:45 pm, what is the mean arrival time of

buses at the depot?

9 The IQ of a population has a standard deviation of 15. In a school 20% of students have

an IQ larger than 125. What is the mean IQ of students in this school?

10 The distance an athlete can jump is normally distributed with mean 5:2 m. If 20% of

the jumps by this athlete are less than 5 m, what is the standard deviation?

11 The weekly income of a greengrocer is normally distributed with a mean of $6100. If

85% of the time the weekly income exceeds $6000, what is the standard deviation?

Find the mean and standard deviation of a normally distributed random variable Xif Pr(X 6 20) = 0:1 and Pr(X > 29) = 0:15

X » N(¹, ¾2) where we have to

find ¹ and ¾.

We start by finding z1 and z2 which

correspond to x1 = 20 and x2 = 29.

Now z1 =20 ¡ ¹

¾= invNorm(0:1) = ¡1:282 ) 20 ¡ ¹ = ¡1:282¾ .... (1)

and z2 =29 ¡ ¹

¾= invNorm(0:85) = 1:036 ) 29 ¡ ¹ = 1:036¾ ....... (2)

Solving these two equations gives ¹ + 25:0 and ¾ = 3:88

Let the mean weight of the population be ¹ g.

If X g denotes the weight of an adult scallop,

then X » N(¹, 5:92):

As we do not know ¹ we cannot use the

invNorm directly, but we can find the z-value.

Now Pr(X 6 58:2) = 0:15

) Pr(Z 658:2 ¡ ¹

5:9) = 0:15

)58:2 ¡ ¹

5:9= invNorm(0:15) = ¡1:0364

) 58:2 ¡ ¹ + ¡6:1

¹ + 64:3 So, the mean weight is 64:3 g.

An adult scallop population is known to have a standard deviation of g. Ifof scallops weigh less than g, find the mean weight of the population.

5 9 15%58 2

::

Example 14

Example 15



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


INVESTIGATION 2 THE GEOMETRIC SIGNIFICANCE OF AND¹ ¾

12 a Find the mean and the standard deviation of a normally distributed random variable

X, if Pr(X > 80) = 0:1 and Pr(X 6 30) = 0:15:

b In a Mathematics examination it was found that 10% of the students scored at least

80, and no more than 15% scored under 30. Assuming the scores are normally

distributed, what proportion of students scored more than 50?

13 The diameters of pistons manufactured by a company are normally distributed. Only

those pistons whose diameters lie between 3:994 and 4:006 cm are acceptable.

a Find the mean and the standard deviation of the distribution if 4% of the pistons

are rejected as being too small, and 5% are rejected as being too large.

b

In the previous section a number of assertions were made about the standard deviation. In

this section some of these assertions will be justified.

1 The normal probability density function is f(x) =1

¾p

2¼e¡

12 (

x¡¹¾

)2 .

Use technology to graph this function for a ¹ = 6, ¾ = 1 b ¹ = 6, ¾ = 2.

2 Show that the derivative of f(x) is f 0(x) = ¡x ¡ ¹

¾2f(x).

3 Use the result in 2 to show that f (x) has a maximum value at x = ¹.

4 Show that f 00(x) = ¡ 1

¾4(¾2 ¡ (x ¡ ¹)2) f(x) .

5 Use the result of 4 to find the points of inflection of f(x).

From Investigation 2 you

should have discovered that

the points of inflection occur

at x = ¹+¾ and x = ¹¡¾.

Consequently:

For a given normal curve the standard deviation is uniquely determined as the

horizontal distance from the vertical line x = ¹ to a point of inflection.

INVESTIGATING PROPERTIES

OF NORMAL DISTRIBUTIONSF

What to do:

x

��

� �

point of

inflection

point of

inflection


Determine the probability that the diameter of a randomly chosen piston liesbetween . mm and . mm.3 997 4 003

GRAPHING

PACKAGE


0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\244SA12STU-2_07.CDR Thursday, 9 November 2006 3:04:23 PM DAVID3

INVESTIGATION 3 CALCULATING PROBABILITIES

FROM NORMAL DISTRIBUTIONS

Suppose a dietician wants to know the mean

weight of thirteen year old Australian boys.

It is impractical to weigh each thirteen year

old boy in Australia, but the dietician could

find the mean weight of a randomly selected

sample of, say, 10 boys.

The mean weight of the sample of 10 boys

is a statistic that is then used to estimate the

population parameter.

Clearly the mean weight depends on the sam-

ple. If another health worker had selected a

different sample of 10 boys, it would be un-

likely that the two sample means would be

the same.

The statistic the sample weight is a new variable. Repeated sampling can be used to discover

how the variable sample weight is distributed. In particular we want to know how the mean

of the sample means and the standard deviation of the sample means is related to the parent

population of 13 year old boys.

The following investigation explores the relation between the statistic “sample mean” and the

parameter “population mean”.

SPREADSHEET

DISTRIBUTION OF SAMPLE MEANSG

To find probabilities from a normal distribution you need to be able to find

areas between the graph of f(x) = 1¾p2¼

e¡12 (

x¡¹¾

)2 and the x-axis.

A simple way to estimate these probabil-

ities is to approximate them with areas

of rectangles that fit snugly around the

curve.

The area beneath the smooth curve is

approximately equal to the sum of the

areas of the rectangles.

Use a spreadsheet to:

² calculate the area of each rectangle using area = base £ height

² add the areas of rectangles to find an approximate area below the curve.

Details of how to set up a spreadsheet can be found by clicking on

the icon.

What to do:



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


INVESTIGATION 4 A SIMPLE RANDOM SAMPLER

Suppose a school has 216 thirteen year old boys.

Let the variable X be the weight in kg of the boys.

The table shows all the possible values of X in random order.

31:2 35:7 36:4 33:2 37:3 35:0 34:0 33:6 34:4 32:0 32:7 36:730:8 33:8 32:9 35:4 31:9 36:7 32:0 29:2 33:6 31:0 32:5 36:433:3 36:7 27:9 32:0 36:4 34:5 35:3 31:6 32:5 35:3 34:6 31:134:9 30:9 33:2 33:8 33:6 30:5 37:7 30:9 35:0 33:2 36:2 35:231:8 35:9 32:8 30:8 29:0 32:1 34:6 32:7 35:4 30:4 33:3 30:233:3 35:5 32:0 34:8 30:2 36:3 35:7 38:9 32:0 28:0 32:7 33:6

35:4 31:2 32:5 29:6 35:1 32:9 37:3 33:6 36:7 30:7 32:8 32:529:4 33:5 32:5 30:1 34:9 32:3 34:9 31:4 33:0 32:4 29:7 33:630:6 30:5 30:5 36:3 34:3 32:1 36:6 31:3 30:8 29:8 30:8 29:233:1 35:0 32:5 34:1 33:2 32:9 30:2 33:4 33:2 31:1 32:3 30:632:0 31:4 32:4 37:1 32:5 35:9 29:4 30:3 34:9 32:1 34:6 35:731:4 27:5 31:7 37:1 29:9 31:6 35:4 32:5 33:4 35:2 34:2 29:5

34:3 31:9 33:2 34:5 32:4 30:8 32:4 32:0 27:1 36:4 34:0 32:431:9 32:6 29:4 32:6 35:5 33:0 35:5 31:4 40:6 37:1 31:4 30:031:5 31:6 34:2 29:1 35:4 29:9 32:0 33:7 29:0 32:0 29:9 34:635:0 27:0 31:8 36:1 32:7 31:0 30:4 35:9 38:4 31:6 34:4 31:632:3 33:4 35:3 38:7 37:5 32:1 29:7 33:9 34:0 34:2 29:2 37:629:3 34:0 30:6 37:1 30:4 33:2 33:7 28:5 36:2 35:7 36:4 33:2

1 Select a sample of 10 boys from this population by:

a rolling a die to select one of the 6 blocks

b rolling the die again to select a row in the block

c rolling the die again to select a boy in the row

d count off 10 boys from left to right from the boy you selected.

If the 3 rolls of the die produced f3, 2, 4g, the boy selected has weight 30:1 kg.

The sample selected is presented in the first column of the table.

2 Copy and enter your data in the following table.

Number Sample 1 Sample 2 Sample 3 Sample 4 Sample 5

1 30:12 34:93 32:34 34:95 31:46 33:07 32:48 29:79 33:610 30:6

mean, x 32:3

1 2

3 4

5 6

What to do:



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


INVESTIGATION 5 A COMPUTER BASED RANDOM SAMPLER

3 The last row in this table consists of 5 sample means.

The variable of sample means can be denoted by X10. The bar on the top indicates

it is a variable of means; the subscript 10 indicates that the means are of samples of

size ten.

The last row of your table is a sample of size 5 from the distribution of X10.

4 Combine your results with those of the other students of your class.

Draw a histogram of the sample means.

5 Calculate the mean and the standard deviation of the sample means.

6 Compare the mean and the standard deviation you found in 5 with the mean weight

33:1 kg and standard deviation 2:54 kg of the 216 boys.

From Investigation 4 you should have discovered that the sample means are close to the

population mean. The mean of the sample means should be particularly close to the population

mean.

You should also have noted that the standard deviation of the sample means is smaller than

the standard deviation of the population.

The following important investigation uses a computer to speed up sampling and obtain a

more accurate picture of how the standard deviation of the sample means is related to the

standard deviation of the population.

In this investigation it is important to distinguish between:

² The original population, sometimes referred to as the “parent population ”, with a

random variable X which has mean ¹ and standard deviation ¾.

In Investigation 4 the parent population consists of 216 thirteen year old boys.

The mean ¹ = 33:1 kg and standard deviation ¾ = 2:54 kg.

and

² The new population with variable Xn, consisting of all statistics of sample means.

The subscript n indicating the sample size is sometimes omitted and the variable

just written X.

A typical outcome of X is a sample mean ¹x =x1 + x2 + :::::: + xn

n

In Investigation 4 a typical outcome is the mean weight of 10 boys.

The investigation explores the shape of the distribution of the random variable X, its

mean ¹X

or ¹(X), and its standard deviation ¾X

or ¾(X).

We start by sampling from a population which has a normal distribution. The heights of

18 year old Australian males may be approximately normal.

In this investigation we examine the variation in sample means.

We examine samples taken from symmetric distributions as well as one thatis skewed.



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


1 Click on the icon given alongside. This opens a worksheet named

Samples with a number of buttons. Click on each of these buttons

in turn.

2 Sample size: from which you can select the numbers n = 10, 20, 40, 80, 160.

Start with n = 10.

3 Find sample means: finds the means of each of two hundred different samples.

4 Analyse: lists the two hundred sample means.

It finds the standard deviation sX

and

draws a histogram of these sample means.

It also superimposes a normal probability density function.

Trial 1 Trial 2 Trial 3 Trial 4

n (sX

)2 (sX

)2 (sX

)2 (sX

)2

10

20

40

80

160

5 Make a copy of the table alongside.

Enter the value of (sX

)2 in the first

column next to n = 10.

6 Go back to the worksheet named

Samples and change the sample size

to 20. Repeat steps 3, 4, and 5.

Enter the value of (sX

)2 next to

n = 20 in the table.

7 Repeat for samples of size 40, 80and 160.

8 We wish to see how (sX

)2 is related to the standard deviation of the population.

However, (sX

)2 can vary quite a lot, so to spot the pattern more clearly you should

repeat the experiment another 3 times.

9 From your experiment, determine a relationship between the square of the sample

standard deviation (sX

)2 and the square of the population standard deviation.

10

11

What to do: STATISTICS

PACKAGE

STATISTICS

PACKAGE

STATISTICS

PACKAGE

Now click on the icon to sample data from a population with auniform distribution. These distributions are very commonly usedin computer games where, for example, cards have to be selectedat random. Complete an analysis of this data by repeating theabove procedure and recording all results.

Now click on the icon to sample data from a population with anexponential distribution. These distributions are notoriously skew.They are commonly used in modelling lifetimes, such as thelifetime of light globes. Complete an analysis of this data byrepeating the above procedure and recording all results.


This output is shown on the worksheet named Analysis.

Note that the first graph on this worksheet is the graph of the probability densityfunction of the population, and that the axes differ from that of the other graphs.


0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


APPENDIX

From the investigation you should have discovered the following:

If X is a random variable with mean ¹ and standard deviation ¾ then the random

variable Xn of sample means of size n has:

² mean ¹X

= ¹, the same as the mean of the random variable X

² standard deviation ¾X

=¾pn

.

Furthermore, for large values of n, Xn is approximately normal.

² The histogram of the sample means becomes symmetric and starts

to take on a bell-like shape. For large values of n it becomes

approximately normal.

² The mean of the sample

means approximates the

population mean.

Individual points selected

from any distribution are

likely to come from either

side of the mean, and dif-

ferences are likely to av-

erage out.

² As the sample size increases, there is less variability.

² This diagram shows what happens if the sample size n increases.

The spread decreases since =¾pn

and ¹X

= ¹:

You should notice:

¾X

¾X

¾X

�

x x x x1 2 3, , ,..., n x x x x1 2 3, , ,..., n x x x x1 2 3, , ,..., n

Sample 1 x1 Sample 2 x2 Sample 3 x3

x1 x2 x3

¹X

¾X


In the the behaviour of the mean and the standard deviation areexplored algebraically. It is beyond the level of this course to show whythe distribution of the sample means is approximately normal.

Appendix

�


0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\249SA12STU-2_07.CDR Wednesday, 8 November 2006 8:41:06 AM DAVID3

1 A machine produces sheets of cardboard with mean thickness 3 mm and standard devi-

ation 0:12 mm. A quality controller checks the thickness of each sheet in 10 different

places. Let the random variable X be the thickness of the cardboard at any point, and

let the random variable X10 be the mean thickness of the 10 points.

a The quality controller records the following thicknesses in mm from a sample of

10 points: 3:02, 2:77, 3:08, 2:89, 3:21, 2:79, 2:97, 3:07, 2:94, 3:01: What is the

corresponding outcome of the random variable X10?

b If the quality controller records 10 outcomes of X as:

x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, what is the corresponding statistic of X10?

c What is the mean and standard deviation of X10?

2 Records show that a machine has been producing screws with mean length 75 mm and

standard deviation 0:5 mm. Screws are packaged in lots of 50. Let the random variable

X50 be the mean length of a screw in a packet.

Find the mean and standard deviation of X50.

The life expectancy , of a certain brand of AAA battery is known to have a

mean hours and standard deviation hours. The batteries are sold in

packets of . Let the random variable be the mean life expectancy of batteries in

a packet.

X¹ ¾ :

X� � � � �= 27 = 3 25

6 6

a

What is the corresponding outcome of the random variable X6?

b If the numbers of hours lasted by batteries in a packet of six were

x1, x2, x3, x4, x5, x6 what is the corresponding outcome of X6 ?

c What is the mean and standard deviation of X6?

a The outcomes of X6 are the means of the life expectancies of 6 batteries in

a packet. In this case the outcome of X6 is the statistic

x =25:3 + 21:6 + 27:75 + 22:25 + 35:5 + 28:5

6+ 26:8

b If the batteries in the packet lasted for x1, x2, x3, x4, x5, x6 hours, the

corresponding outcome of X6 is the statistic x =x1 + x2 + x3 + x4 + x5 + x6

6.

c The mean of X6 is the same as the mean of X, so ¹X6

= 27 hours.

Since the standard deviation of X is 3:25, the standard deviation of X6 is

¾X6

=¾p6

=3:25p

6+ 1:327

The batteries in a packet were tested and the number of hours they lastedwere: , , , , ,

625 3 21 6 27 75 22 25 35 5 28 5: : : : : :

Example 16

EXERCISE 7G.1



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


3 The time it takes a train from Adelaide to Belair to complete its journey is known to

have a mean of 40 minutes and standard deviation of 3 minutes. An inspector times 8such trips. Let X8 be the mean travel time of a sample of 8 trips. Find the mean and

standard deviation of X8.

4 Suppose the probability a coin falls heads is p and the probability it falls tails is q = 1¡p.

Let the random variable X = 1 if it falls heads and X = 0 if it falls tails.

a Show that the mean of X is p.

b Show that the standard deviation of X isppq =

pp(1 ¡ p).

c Let Xn be the sample mean of n tosses of the coin.

i Find the mean and standard deviation of Xn.

ii Describe in words how Xn is related to the tosses of a coin.

In general, knowing the mean and standard deviation of a random variable X is insufficient

information to calculate probabilities. However, we are able to calculate probabilities in the

special case where X is normally distributed. Not only that, but if X is normally distributed,

the random variable Xn of sample means of size n is also normally distributed.

Example 17

Including yourself there are 12 persons in the line to be served.

To complete buying your ticket in less than 10 minutes the mean serving time per

person has to be less than10 £ 60

12= 50 seconds.

The time it takes to serve a customer at a railway station ticket booth is normallydistributed with mean seconds and standard deviation seconds. You only have

minutes to buy your ticket or you will miss your train. If there is a line ofpeople in front of you waiting to be served, what is the probability you will catch thetrain?

T45 20

10 11

Example 18

Suppose the random variable X is normally distributed with mean 40 and standard

deviation 10. Let X20 be the sample means of size 20. Find:

a Pr(35 < X < 45) b Pr(35 < X20 < 45).

a Pr(35 < X < 45)

= normalcdf(35, 45, 40, 10)

+ 0:383

b The mean of X20 = mean of X = 40:

The standard deviation of X20 = 10p40

Pr(35 < X20 < 45)

= normalcdf(35, 45, 40, 10p40

)

= 0:998

Notice that about 38% of the individual outcomes are in the interval 35 < X < 45,

but almost all of the sample means lie in this interval.



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


Let the random variable T 12 be the mean time to serve 12 persons.

Since T is normally distributed with mean 45 and standard deviation 20, T 12 is

normally distributed with mean 45 and standard deviation 20p12

.

Pr(T 12< 50)

= normalcdf(¡E99, 50, 45, 20p12

)

+ 0:807

5 Suppose the random variable X is normally distributed with mean 80 and standard

deviation 20. Let X10 be the sample means of size 10: Find:

a Pr(75 < X < 85) b Pr(75 < X10 < 85)

6 Let the random variable X be the IQ of 17 year old girls. Suppose X is normally

distributed with mean 105 and standard deviation 15.

a Find the probability that an individual 17 year old girl has an IQ of more than 110.

b Find the probability that the mean IQ of a class of twenty 17 year old girls is greater

than 110.

7 A manufacturer of chocolates produces chocolates of mean weight 20 g and standard

deviation 5 g. A box of 13 such chocolates is sold with the claim that the nett weight in

the box is 250 g. Assuming the weights are normally distributed:

a For what proportion of boxes is this claim correct?

b If the manufacturer decides to increase the number of chocolates to 15 per box, for

what proportion of boxes is the claim now true?

In the previous investigation, we also observed that the distribution of the sample means Xis approximately normal.

Note:

²

² In the special case where the population is normally distributed, the distribution X of

the sample means is always normal.

THE CENTRAL LIMIT THEOREM

The Central Limit Theorem

There is no simple answer as to how large should be before the central limit theoremcan be applied. It depends on many factors including how much accuracy is required. Ifthe population is very skew it may require a large sample size , whereas if thepopulation is symmetric a small sample size may be sufficient. As a rule of thumb,

is often used, but each case must be considered on its merits.

n

nn

n� �>30

So, the probability of catching the train is 0 807:

¹ and standard deviation ¾: For sufficiently large n, the distribution Xn of the sample means

of size n, is approximately normal with mean ¹X

= ¹ and standard deviation ¾X

=¾pn:

Suppose is a random variable which is not necessarily normally distributed, but has meanX



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


The standard deviation ¾X

=¾pn

of the sample means X is a measure of the

variability of sample means, and is called the sampling error or the standard error.

Note:

² Unless the population is small, the population size is almost irrelevant.

²

For example, a sample size of 1000 gives a sampling error of ¾X

=¾p1000

+¾

32

whereas a sample of 4000, four times the size, only halves the sampling error.

Two histograms of samples, each of size , are shown below. One is from auniform distribution with mean and standard deviation . The other is fromthe distribution of the sample means of size selected from the distribution .Note that the scales are not the same in the two diagrams.

40010 5 77

36X :

X X36

a Which of the two histograms is from X36? Give reasons for your answer.

b From the diagram estimate Pr (X36 < 9).

c Find the approximate mean and standard deviation of X36.

d Use the histogram to estimate the probability X36 is one standard deviation

from the mean.

a The data in Histogram A is less spread out than that in Histogram B, and

appears clustered around 10. Histogram A is the histogram for the

distribution X36.

b To find Pr (X36 < 9) we count the numbers in all the bins before the

bin [9, 9:25), and use the fact that there are 400 in the sample. We get:

THE SAMPLING ERROR

The larger the value of , the smaller the sampling error. A sufficiently large sampleshould give an accurate estimate of the mean. However, making the sample size too bigmay be expensive and may not improve the reliability of the estimate by much.

n

Example 19

We are trying to estimate the using a . By only looking at asmall portion of the population, the sample mean is likely to be different from the populationmean.

population mean sample mean

Histogram A

�

�

��

�

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

�

�

��

��

��

��

��

��

interval

Histogram B

�

�

��

��

��

��

�

��

�

��

�

��

�

��

��

��

��

��

�

��

��

��

��

��

��

interval

freq

uen

cy

freq

uen

cy



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


Pr (X36 < 9) =15 + 15 + 12 + 3 + 2 + 3 + 2 + 1

400=

53

400+ 0:13

Your answer may vary a little depending on how well you can read the numbers

on the graph.

c The mean of X36 = mean of X = 10.

The standard deviation ¾X

+¾p36

=5:77

6= 0:962

d Pr(10 ¡ 0:96 < X < 10 + 0:96) = Pr(9:04 < X36 < 10:96)

+ Pr(9 < X36 < 11)

=30 + 27 + 39 + 44 + 45 + 42 + 31 + 30

400= 0:72

This crude estimate compares with 0:68 when using the normal approximation.

1 The IQ measurements of a population have mean 100 and standard deviation 15. Many

hundreds of random samples of size 36 are taken from the population and a relative

frequency histogram of the sample means is formed.

a What would we expect the mean of the samples to be?

b What would we expect the standard deviation of the samples to be?

c What would we expect the shape of the histogram to look like?

2 Two histograms of sample size 300 each are shown below. One is from a life expectancy

distribution X with mean 10 and standard deviation 10. The other is from the distribu-

tion X64 of the sample means of size 64 selected from the distribution X. Note that

the scales are not the same in the two diagrams.

a Which of the two histograms is from X64? Give reasons for your answer.

b From the diagram estimate Pr(X64 < 9).

c Find the approximate mean and standard deviation of X64 .

d Use the histogram to estimate the probability that X64 is one standard deviation from

the mean. How does this answer compare with using the normal approximation?

EXERCISE 7G.2

Histogram A

0

20

40

60

��

� ��

�

��

��

��

��

��

�

��

��

��

��

� ��

��

� ��

��

��

�

freq

uen

cy

Histogram B

0

51015202530

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

�

�

��

��

��

��

��

��

��

��

��

� �

��

�

�

interval

freq

uen

cy

interval

36



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


3 During a one week period in Sydney the mean price of an orange was 42:8 cents with

standard deviation 8:7 cents. Find the probability that the mean price per orange from a

case of 60 oranges was less than 45 cents.

4 The mean energy content of a fruit bar is 1067 kJ with standard deviation 61:7 kJ. Find

the probability that the mean energy content of a sample of 30 fruit bars is more than

1050 kJ/bar.

5 The mean sodium content of a box of cheese rings is 1183 mg with standard deviation

88:6 mg. Find the probability that the mean sodium content per box for a sample of 50boxes lies between 1150 mg and 1200 mg.

6 Customers at a clothing store are in the shop for a mean time of 18 minutes with standard

deviation 5:3 minutes. What is the probability that in a sample of 37 customers the mean

stay in the shop is between 17 and 20 minutes?

7 The mean contents of a can of cola is 382 mL, even though it says 375 mL on a can.

The statistician at the factory says that the standard deviation is steady at 16:2 mL. Find

the probability that a slab of three dozen cans has mean contents less than 375 mL per

can.

The age of men in Australia is distributed with mean 43 and standard deviation 8.

If a sample of 67 men is selected from the population of Australian men, what is

the probability the sample mean is:

a less than 42 b greater than 45 c between 40 and 45?

Let the random variable X be the mean age of samples of 67 Australian males.

Assuming n = 67 is sufficiently large for the Central Limit Theorem to apply,

X is approximately normal with mean 43 and standard deviation ¾X

= 8p67

.

a Pr(X < 42)

= normalcdf(¡E99, 42, 43, 8p67

)

+ 0:153

b Pr(X > 45)

= normalcdf(45, E99, 43, 8p67

)

+ 0:0204

c Pr(40 < X < 45)

= normalcdf(40, 45, 43, 8p67

)

+ 0:979

Example 20

43 45

43 45 �

4342



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


INVESTIGATION 6 CHOCKBLOCKS

Chockblock produce mini chocolatebars which vary a little in weight. Themachine used to make them producesbars whose weights are normally

distributed with mean grams and standarddeviation grams. bars are then placed in apacket for sale. Hundreds of thousands of packetsare produced each year with mean weight .

18 23 3 25

::

X

8 A sample of 375 people will be used to estimate

the mean number of hours that will be lost due

to sickness this year. Last year the standard de-

viation for the number of hours lost was 67 and

we will use this as the standard deviation this

year. What is the probability that the estimate is

9 A concerned union member wishes to estimate the hourly wage of shop assistants in

Adelaide. He decides to randomly survey 300 shop assistants to calculate the sample

mean. Assuming that the standard deviation is $1:27, find the probability that the estimate

of the population mean is in error by 10 cents or more.

1 What are the mean ¹X

and standard deviation ¾X

of X?

2 Printed on each packet is the nett weight of contents, 425 grams. What is the manu-

facturer claiming about the mean weight of each bar?

Let the random variable X be the mean of samples of 60. As the sample size is

larger than 30, we assume that X is normally distributed with mean ¹ and standard

deviation 8p60

.

We need to find Pr(¡2 < X ¡ ¹ < 2).

Now Pr(¡2 < X ¡ ¹ < 2) = Pr

µ ¡28p60

<X ¡ ¹

8p60

<28p60

¶

= Pr³

¡p60

4 < Z <p604

´= normalcdf(¡

p60

4 ,p604 , 0, 1)

+ 0:947

A population is known to have a standard deviation of but has an unknown mean. In order to estimate , the mean of a random sample of is found. Find the

probability that this estimate is out by less than .

860

2¹ ¹

Example 21

What to do:

in error by less than ten hours?



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


3 What percentage of their packets will be rejected because they fail to meet the 425gram claim?

4 An additional bar is added to each packet with the nett weight claim retained at 425grams.

a What is the minimum acceptable claim now?

b What are the mean ¹X

and standard deviation ¾X

now?

c What percentage of these packets would we expect to reject?

Claims are often made about the population mean of some

quantities.

For example, it is claimed that the mean protein content of a

1 litre carton of milk is 39 grams. The truth of this claim can

only be known by measuring the protein content of every 1litre carton of milk, clearly an impossible task. It is, however,

possible to draw reasonable conclusions from measuring the

protein content of a random selection of cartons.

A statistical hypothesis is a statement about a population parameter. The parameter

could be a population mean or a proportion.

In this section we will test hypotheses concerning the mean ¹.

When a statement is made about a product, it is usually tested statistically before changes to

the product are made.

The alternative hypothesis denoted Ha is that the statistical evidence is sufficient to accept

the consumer’s claim, i.e., that the milk company’s statement is false.

So, we consider two hypotheses:

HYPOTHESIS TESTING FOR A MEANH

HYPOTHESIS ABOUT MEANS

²

²

a which is a statement of or . It isassumed to be true until sufficient evidence is provided so that it is rejected.

an which is a statement that there orwhich has to be established. Supporting evidence is necessary if it is to

be accepted.

null hypothesis

alternative hypothesis

H

H

0 no difference no change

is a difference

changea

For example, suppose a consumer makes the statement that the mean protein content inlitre cartons of milk is not grams. The milk company does not want to go to the

expense of changing packaging until it is statistically shown that the mean protein content isindeed not grams. The company will start with the assumption that their claim is true,and whatever tests the consumer did were just random fluctuations. This assumption orstatement of no change is called the and is usually denoted .

1 39

39

�

null hypothesis H0



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


We want to test the claim that the mean protein content of 1 litre cartons of milk is 39 grams.

The null hypothesis is H0: ¹ = 39

The alternative hypothesis is Ha: ¹ 6= 39

Suppose we select a sample of 10 cartons of milk and find that for this sample the mean

protein content is ¹x = 38:4 grams.

Suppose it is known that the standard deviation of protein in 1 litre containers of milk is

¾ = 0:8 grams.

Let X be the protein content of a 1 litre container of milk, so according to the null hypothesis,

X » N(39, 0:82).

Let the random variable X be the mean protein content of a sample of 10 one litre cartons.

Hence X » N ¹,µ ¾p

n

¶2i.e., X » N 39 ,

0:8p10

2

.

HYPOTHESIS TESTING WHEN THE POPULATION IS NORMALLY

DISTRIBUTED

We need to determine the likelihood that this difference isdue to random fluctuation or chance, or whether it issufficient evidence to say the milk company’s statement isincorrect.

Since the protein content of milk is a result of manydifferent factors, it is reasonable to assume that the proteincontent of litre cartons of milk is normally distributed.1

µ ¶ µ ¶ ¶µ

We use this to calculate the z-score of the observed value ¹x = 38:4 grams.

z =¹x¡ ¹¾pn

=38:4 ¡ 39

0:8p10

+ ¡2:37 So the number of standard deviations ¹x is

from the mean is ¡2:37 .

If the difference between the observed value of ¹x and the mean is due to chance alone, it

could just as likely have been 2:37 standard deviations to left or right of the mean. So, the

probability that X is 2:37 standard deviations or more either side of the mean is a measure

of how likely this is to occur.

Now Pr(Z 6 ¡2:37 or Z > 2:37) = 2 £ Pr(Z 6 ¡2:37) fsymmetryg= 2 £ normalcdf(¡E99, ¡2:37)

= 0:0178

so the probability of this event happening is small.

One of the problems with random processes is that differences can always be due to chance.

However, the practical solution is to reject the null hypothesis if the probability of the observed

or more extreme results occurring is small.

The probability ® at which we reject the null hypothesis is called the significance level of the

test. Common significance levels are ® = 0:05 or 5% and ® = 0:01 or 1%.

In the above example, Pr(Z 6 ¡2:37 or Z > 2:37) = 0:0178 . This is less than 0:05 so

we would reject the null hypothesis at the significance level of 0:05, but not at the significance

level of 0:01 .



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


The procedure for testing a hypothesis is:

Step 1: State the null hypothesis H0: ¹ = ¹0

and the alternate hypothesis Ha: ¹ 6= ¹0.

Step 2: Select a significance level, usually 0:05 .

Unless otherwise stated, the level of 0:05 is used in

this book.

Step 3: From a sample, calculate the sample mean ¹x.

If the parent population is normally distributed with

mean ¹ and standard deviation ¾, then the random

variable X of sample means has the normal

distribution

is called the null distribution:

The null distribution is critical. It allows us to calcu-

late the probability of the observed or more extreme

events happening if the null hypothesis is true.

Step 4: Use the sample mean ¹x to find the test statistic

z =¹x¡ ¹¾pn

:

Step 5: Calculate the probability of all observations having

z-values more extreme than the test statistic z found

in Step 3.

Step 6: ² Reject the null hypothesis if the P-value is less

than the significance level decided on in Step 2.

The smaller the P-value is, the stronger the

evidence against the null hypothesis.

² If the P-value is larger than the significance

level decided on in Step 2, do not reject the

null hypothesis.

H0: ¹ = 39

Ha: ¹ 6= 39

X » N(39, 0:2532)

z = ¡2:37

P= Pr(Z 6 ¡2:37or Z > 2:37)

= 0:0178

The is the probability of all observationshaving a -value more extreme than the test statistic.

P-value

z

N ¹,µ ¾p

n

¶2µ ¶.N ¹,

µ ¾pn

¶2µ ¶

Since we include the extreme outcomes either sideof the mean, we call this a . Onlytwo-sided tests are considered in this course.

two-sided -testZ

The name derives its name from this statistic.Z-test

Since P , wedo not reject the

null hypothesis atthe level.

� �> :

:

0 01

0 01

Since P wereject the null

hypothesis at thelevel.

� �< :

:

0 05

0 05

Milk cartons example



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


When a null hypothesis is not rejected, the terms “retain” and “accept” are often used. This

does not mean that the null hypothesis is true, but rather that there is not enough evidence to

show it is not true.

Similarly, when rejecting the null hypothesis, it is often stated that the alternative hypothesis

is “accepted”. This does not mean that the alternative hypothesis is true. However, if the null

hypothesis is true, the outcome that led to rejecting it is a very unlikely one. The P-value

tells you just how unlikely.

Notice that weuse and notfor the test.

¾ sZ-

TI

C


Note: If H0 is rejected,

² the direction of the difference is determined by the value of ¹x

² we still do not know how accurate the claim was.

1 A random variable X is normally distributed with a standard deviation ¾ = 4. It is

claimed that the mean of X is ¹ = 17.

a To test this claim a random sample of n = 50 was taken and the sample mean ¹xwas found to be 16.

i Write down the hypotheses H0 and Ha . ii Write down the null distribution.

Step 1: H0 : ¹ = 74 Ha : ¹ 6= 74

Step 2: Significance level is 0:05

Step 3: The sample mean, ¹x = 72

Let the random variable X be the sample means, so the null distribution

is X » N(¹,

µ¾pn

¶2

) i.e., X » N(74,

µ7p40

¶2

):

Step 4: The test statistic is z =¹x ¡ ¹¾pn

=72 ¡ 74

7p40

+ ¡1:81

Step 5: The P-value is P = Pr(Z 6 ¡1:81 or Z > 1:81)

= 2 £ Pr(Z 6 ¡1:81)

+ 0:0708

Step 6:

A Mathematics coaching school knows that the results for their final test arenormally distributed with population mean and standard deviation . A newcoaching technique which is cheaper to implement but reported to have the sameresults is trialled by the school. In a trial of students it is found that the meanscore for the final test is with standard deviation . Is there sufficientevidence at the level to conclude that the final test scores will be different?

74% 7%

4072% 6%

5%

Example 22

As P there is insufficient evidence to rejectthe null hypothesis that the new coaching produces thesame results as the old technique. We thus accept that thenew technique has the same result as the old technique.

� � � �=0 0708 0 05: > :

EXERCISE 7H.1


0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\260SA12STU-2_07.CDR Thursday, 9 November 2006 10:12:45 AM DAVID3

iii Calculate the test statistic.

iv Calculate the P-value.

v What conclusion is there at the 0:05 level?

b Suppose that a random sample of n = 70 was taken and ¹x = 16. What can you

now conclude at the 0:05 level?

2 A random variable X is normally distributed with a standard deviation ¾ = 6. A random

sample of 40 was taken and the sample mean was found to be ¹x = 61:4 .

Use this information to test the claim that the population mean of X is ¹ = 60.

.N ¹,µ ¾p

n

¶2µ ¶

The bottlers of Groutt claim that the mean volume of bottles is 503 mL.

To test this claim 10 bottles were selected.

The measurements are listed below to the nearest 0:1 mL:

502:5, 501:0, 501:5, 503:9, 498:7, 505:7, 504:6, 499:4, 501:8, 501:1

Test the claim made by the bottlers of Groutt at the 5% level if it is known that

the population standard deviation ¾ is 1:8 mL.

We need to test: the null hypothesis H0: ¹ = 503

against the alternative hypothesis Ha: ¹ 6= 503

Let X be the volume of each bottle of Groutt. As the bottling of liquids is subject to

many random fluctuations, it is reasonable to assume that X is normally distributed

with mean ¹ and standard deviation ¾.

Let X be the distribution of the sample means, so the null distribution of X is

From the null hypothesis we assume that ¹ = 503.

From the sample we find that ¹x = 502:02, so the test statistic

z =¹x ¡ ¹¾pn

+502:02 ¡ 503

1:8p10

+ ¡1:722

The P-value is P = Pr(Z 6 ¡ 1:722 or Z > 1:722)

= 2 £ Pr(Z 6 ¡1:722)

+ 0:0851

As P > 0:05 there is insufficient evidence to reject the claim that the volume

of bottles of Groutt is 503 mL,

i.e., we accept that the mean volume could be 503 mL.

Example 23



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


TI

C

GRAPHING

PACKAGE

3 A market gardener claims that the carrots in his field have a mean weight of 50 grams.

Before buying the crop a buyer pulls 20 carrots at random. She finds that their individual

weights in grams are:

57:6 34:7 53:9 52:5 61:8 51:5 61:3 49:2 56:8 55:9

57:9 58:8 44:3 58:3 49:3 56:0 59:5 47:0 58:0 47:2

a Explain why it is reasonable that the distribution of carrots’ weights is normally

distributed.

b Test the claim made by the market gardener if it is known that the standard deviation

for the whole crop is 7:1 grams.

4 The length of screws produced by a machine is known to be normally distributed with

standard deviation ¾ = 0:08 cm.

The machine is supposed to produce screws with a mean length of ¹ = 2:00 cm.

A quality controller selects a random sample of 15 screws and finds that the mean length

of the 15 screws is ¹x = 2:04 cm with sample standard deviation of s = 0:09 cm.

Does this justify the need to adjust the machine?

To see how to do hypothesis testing using a calculator,

click on the appropriate icon.

In the examples we have seen so far, the variable X was normally distributed and so the

distribution of sample means X was normally distributed also. This may not be true if Xis not normally distributed. However, if the sample size n is sufficiently large, the Central

Limit Theorem tells us that X is approximately normally distributed with mean ¹ and standard

deviation¾pn

.

We can use this fact to test claims about population means.

Susan’s resting pulse rate has been 55 beats per minute

for many years with standard deviation ¾ = 2:6 bpm.

During a 5 day period she checks her resting pulse rate

8 times a day at regular intervals and finds that it has

mean 56:2.

Is there sufficient evidence, at a 5% level, to conclude

that Susan’s pulse rate has changed?

The null hypothesis is H0: ¹ = 55. The alternative hypothesis is Ha: ¹ 6= 55

The significance level ® = 0:05 .

The number in the sample is n = 5 £ 8 = 40 and the sample mean is ¹x = 56:2.

The population standard deviation ¾ = 2:6 .

HYPOTHESIS TESTING WHEN THE POPULATION IS NOT NECESSARILY

NORMALLY DISTRIBUTED

Example 24



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


Let X be Susan’s resting pulse rate. We do not know how the random variable

X is distributed, but if we assume that n is large enough for the Central Limit

Theorem to apply then the null distribution for the sample means X is

approximately normally distributed with mean ¹ = 55 and standard

deviation¾pn

=2:6p40

= 0:411 .

Entering this information into the calculator gives a P-value of P = 0:003 51 . As

P = 0:003 51 < 0:05 there is evidence at the 0:05 level to reject the null hypothesis.

We accept the alternative hypothesis Ha that Susan’s pulse rate has changed.

1 Globe Industries make torch globes with standard deviation life time of ¾ = 9 hours. If

the globes last too long, people will have no need to buy new ones, but if they do not

last long enough, people will stop buying them. A quality controller is to ensure that

globes made by a machine have a mean life of 80 hours. The quality controller selects

a sample of 50 globes and finds that they have a mean life of 83 hours.

a What is the null hypothesis the quality controller is testing?

b Assuming that a sample of n = 50 is large enough for the Central Limit Theorem

to apply, what is the null distribution the quality controller will be using?

c Is there sufficient reason at the 5% level for the quality controller to adjust the

machine?

2 Let X be the outcome of the roll of a fair six-sided die. The mean outcome of such a

die is ¹ = 3:5 with standard deviation ¾ = 1:708. Jack thinks his die may not be

fair. To test this he rolls the die 100 times and finds that the mean of the 100 rolls is

3:2.

a What null hypothesis is Jack testing?

b Briefly explain why the outcomes of a roll of a fair die are not normally distributed.

c Assuming that a sample of size n = 100 is large enough for the Central Limit

Theorem to apply, what is the null distribution Jack should be using?

d Does Jack have enough evidence at the 5% level to claim the die is not fair?

e Jack’s sister Betty rolls the same die 200 times and finds that the mean of her sample

is also 3:2. Would Betty come to the same conclusion as Jack?

3 While peaches are being canned, 250 mg of preserva-

tive is supposed to be added by a dispensing device.

It is known that the standard deviation of preserva-

tive added is 7:3 mg.

To check the machine, the quality controller obtains

60 random samples of dispensed preservative and

finds that the mean preservative added was 242:6mg.

At a 5% level, is there sufficient evidence that the

machine is not dispensing a mean of 250 mg?

EXERCISE 7H.2



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


0.025

1.960

RR of H0

0.025

�1.96

RR of H0

4 In recent times the mean age for New Zealand women on their first wedding day is 23:6years with a standard deviation of 2:9 years. To determine if this differs from Australian

women, a survey of 32 women was carried out. It was found that the mean age was

24:3 years. Test whether there is a significant difference at a 5% level.

H0

To test the null hypothesis H0 : ¹ = ¹0 against the alternative hypothesis Ha : ¹ 6= ¹0

we have used the test statistic z =¹x¡ ¹¾pn

.

P = Pr(Z 6 ¡z orZ > z) < 0:05

i.e., 2 £ Pr(Z 6 ¡z) < 0:05

i.e., Pr(Z 6 ¡z) < 0:025 :

But invNorm(0:025) + ¡1:96, and so

we reject the null hypothesis at the 5%level if the test statistic

z 6 ¡1:96 or z > 1:96 .

The rejection region for the null hypothesis H0 is the set of values of the test

statistic for which the null hypothesis is rejected.

The 5% rejection region for the null hypothesis H0 : ¹ = ¹0 is the set

fz : z 6 ¡1:96 or z > 1:96g

Note that the calculator also calculates the test statistic z when using the 2-sided Z-test.

REJECTION REGION FOR THE NULL HYPOTHESIS

Example 25

Assuming that , our test at the significance level has been to reject the nullhypothesis if the P-value

z� �> 0 5%

H0 : ¹ = 13:45, Ha : ¹ 6= 13:45 We use s = 0:25 to estimate ¾ as n is large.

Assuming that the sample of size n = 389 is large enough for the Central Limit

Theorem to apply, we find the test statistic z =¹x¡ ¹¾pn

=13:30 ¡ 13:45

0:25p389

+ ¡11:8

Since z < ¡1:96 we reject the null hypothesis that there is no difference in the

price and accept the alternative hypothesis that the price has changed.

A liquor chain claims that the mean price of wine has not changed from what it wasmonths ago. Records show that months ago the mean price was $ for a

mL bottle. A random sample of prices of different bottles of wine is takenfrom several stores and the mean price is $ and the standard deviation is $ .Is there sufficient evidence at the level to reject the claim?

12 12 13 45750 389

13 30 0 255%

:

: :



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


For questions 1 and 2, test the hypothesis using the rejection region for the null hypotheses.

In each case you may assume that the sample size n is large enough for the Central Limit

Theorem to apply.

1

2

To test the hypothesis H0 : ¹ = 40 against Ha : ¹ 6= 40, a random sample

of size 60 was taken and found to have mean ¹x and standard deviation s = 7.

For what values of ¹x will the null hypothesis be rejected at the 5% level? Assume

that the sample size is large enough for the Central Limit Theorem to apply.

The test statistic z =¹x¡ ¹¾pn

=¹x¡ 40

7p60

+¹x¡ 40

0:9037

The null hypothesis will be rejected if z 6 ¡1:96 or if z > 1:96

i.e., if¹x¡ 40

0:90376 ¡1:96 or if

¹x¡ 40

0:9037> 1:96

) ¹x 6 40 ¡ 1:96 £ 0:9037 or ¹x > 40 + 1:96 £ 0:9037

The null hypothesis will be rejected if ¹x 6 38:2 or ¹x > 41:8 .

EXERCISE 7H.3

Example 26

Quickshave produces disposable razorblades. They

claim that the mean number of shaves before a blade

has to be thrown away is 13. A researcher wishes to test

the claim and asks 30 men to supply data on how many

shaves they got from one of the Quickshave blades. The

researcher found that the mean of the sample was 12:8.

Use this information to test the manufacturer’s claim at

a 5% level if the population standard deviation ¾ is 1:6:

It is claimed that the mean disposable income of households in a country town is $50 per

week. To test this claim, 36 households were sampled and it was found that the mean

disposable income of the 36 families was $47. Use this to test the claim that the mean

disposable income is not $50 per week if the population standard deviation ¾ = $12.

3 To test the hypothesis H0 : ¹ = ¡23 against Ha : ¹ 6= ¡23, a random sample

of size 100 was taken and found to have mean ¹x.

For what values of ¹x will the null hypothesis be rejected at the 5% level? You may

assume that the sample size is large enough for the Central Limit Theorem to apply and

that the population standard deviation ¾ = 4.

4 The volume of soft drinks dispensed by a machine is normally distributed with standard

deviation 3 mL. A quality controller has to adjust the machine if the mean volume

dispensed is not 504 mL. To test the machine the quality controller finds the mean

volume ¹x of 20 randomly selected bottles every hour. For what values of ¹x should the

quality controller not adjust the machine?



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


DISCUSSION

Does this mean that if you take a large enough sample, and have a measuring instrument that

can measure outcomes of X accurately enough, you can always reject the null hypothesis?

Compare the formal sentence, “There is a statistically significant difference between the

population mean ¹ and ¹0.” with what is commonly understood by, “There is a significant

difference between the population mean ¹ and ¹0.”

In this section we show how to use a sample mean x to calculate an interval in which we

expect the population mean ¹ to lie. As with all statistics, our estimate for x could by chance

be very far from ¹, and we can never be absolutely sure that ¹ lies within the interval. We

can, however, know how probable it is that ¹ lies in the interval.

A confidence interval estimate of a parameter (in this case the population mean ¹)

is an interval of values between two limits, together with a percentage indicating our

confidence that the parameter lies in that interval.

We now consider how a so-called 95% confidence interval is constructed.

We start by finding the number a for which the standard normal distribution Z has probability

Pr(¡a < Z < a) = 0:95 .

Pr(Z < ¡a) = 0:025

) ¡a = invNorm(0:025)

¡a = ¡1:95996

a + 1:96

So, Pr(¡1:96 < Z < 1:96) = 0:95

This means that:

In any normal distribution, 95% of the outcomes lie within 1:96 standard deviations

from the mean.

So, suppose the random variable X is normally distributed as N(¹, ¾2):

If X is the random variable of sample means of size n, then X » :

) 95% of all ¹x lie in the interval ¹¡ 1:96¾pn< ¹x < ¹ + 1:96

¾pn:

The null hypothesis assumes that the population mean is exactlyequal to . This is required to set up the null distribution needed tocalculate probabilities. However, if the variable that is being tested iscontinuous, the probability that is exactly equal to is zero!

H ¹¹

X¹ ¹

0

0

0

CONFIDENCE INTERVALS FOR MEANSI

�a a

0.95

0.0250.025

0

Because of the symmetry of the graph of thenormal distribution, the statement reduces to

N ¹,µ ¾p

n

¶2µ ¶



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


In the diagram we have shown a few ¹xvalues in this interval as well as one that

is not in this interval.

Note that each of the ¹x is in the middle of a line segment. All of these segments have the

same length as the line segment from ¹ ¡ 1:96¾pn

to ¹ + 1:96¾pn

.

Since Pr(¡1:96 < Z < 1:96) = 0:95 we know Pr(¡1:96 <X ¡ ¹

¾pn

< 1:96) = 0:95 .

So for the outcome x within the confidence interval,

x ¡ ¹¾pn

< 1:96 andx ¡ ¹¾pn

> ¡1:96

) x ¡ ¹ < 1:96¾pn

and x ¡ ¹ > ¡1:96¾pn

) ¹ > x ¡ 1:96¾pn

and ¹ < x + 1:96¾pn

This says that if we were to take many samples of size n and calculate the sample mean ¹xfor each of these samples, then for about 95% of these sample means, the population mean

¹ would lie in the interval

x ¡ 1:96¾pn< ¹ < x + 1:96

¾pn:

Confidence intervals for different confidence levels can be constructed for the population ¹in a similar way. Remember that we cannot be absolutely sure that ¹ will lie within the

confidence interval, but we can be confident that 95% of the time it will be.

�

x1

x1

x3

x3

x2

x2

95%

x4

x4

So, the 95% confidence interval for ¹is from

x¡ 1:96¾pn

to x + 1:96¾pn: x

–

n

�96.1

n

�96.1

x–

n

�96.1� x

–

n

�96.1�

lower limit upper limit

Notice that theinterval calculated

for does not`!vcontain .�



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


INVESTIGATION 7 CONFIDENCE LEVELS AND INTERVALS

Note: Consider samples of different size but all with mean 10 and standard deviation 2.

The 95% confidence interval is 10 ¡ 1:960 £ 2pn

< ¹ < 10 +1:960 £ 2p

n.

For various values of n we have: n Confidence interval

20 9:123 < ¹ < 10:877

50 9:446 < ¹ < 10:554

100 9:608 < ¹ < 10:392

200 9:723 < ¹ < 10:277

We see that increasing the sample size produces confidence intervals of shorter width.

DEMO

9 9.5 10 10.5 11

��10

n = 20n = 50n = 100n = 200

To obtain a greater understanding of confidence levels and intervals, clickon the icon to visit a random sampler demonstration. This willcalculate confidence intervals at various levels of yourchoice ( , , or ) and count the intervalswhich include the population mean.

90% 95% 98% 99%

We are given that x = 84:6 and ¾ = 16:8.

The 95% confidence interval is: x¡ 1:96¾pn

< ¹ < x + 1:96¾pn

i.e., 84:6 ¡ 1:96 £ 16:8p60

< ¹ < 84:6 +1:96 £ 16:8p

60

) 80:3 < ¹ < 88:9

So, we are 95% confident that the population mean weight of yabbies lies

between 80:3 grams and 88:9 grams.

A sample of yabbies was taken from a dam. The sample mean weight of theyabbies was grams. Find the confidence interval for the population mean ifthe population standard deviation is grams.

6084 6 95%

16 8:

:

Example 27



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


1 A random sample of n individuals is selected from a population with known standard

deviation 11. The sample mean is 81:6.

a Find a 95% confidence interval for ¹ if: i n = 36 ii n = 100.

b In changing n from 36 to 100, how does the width of the confidence interval change?

2 Neville works for a software company. He keeps records of the times customers have to

wait to receive telephone support for their software. During a six month period he logs

167 calls, and the mean waiting time is 8:7 minutes. Find a 95% confidence interval

for estimating the mean waiting time for all telephone customer calls for support if the

population standard deviation is 2:08 minutes.

3 A breakfast cereal manufacturer uses a machine to

deliver the cereal into plastic packets which then go

into cardboard boxes. The quality controller ran-

domly samples 75 packets and obtains a sample mean

of 513:8 grams. Construct a 95% confidence interval

in which the true population mean should lie if the

population standard deviation is 14:9 grams.

4 A sample of 42 patients from a drug rehabilitation program showed a mean length of

stay on the program of 38:2 days. Estimate with a 95% confidence interval the average

length of stay for all patients on the program if the population standard deviation is 4:7days.

The fat content (in grams) of 30 randomly selected pasties at the local bakery was

determined and recorded as:

15:1 14:8 13:7 15:6 15:1 16:1 16:6 17:4 16:1 13:917:5 15:7 16:2 16:6 15:1 12:9 17:4 16:5 13:2 14:017:2 17:3 16:1 16:5 16:7 16:8 17:2 17:6 17:3 14:7

From a calculator x = 15:90 and we are given ¾ = 1:35

The 95% confidence interval for ¹ is

x¡ 1:96¾pn< ¹ < x + 1:96

¾pn

) 15:90 ¡ 1:96 £ 1:35p30

< ¹ < 15:90 + 1:96 £ 1:35p30

) 15:4 < ¹ < 16:4

So, we are 95% confident that the mean fat content of all pasties produced lies

between 15:4 g and 16:4 g.

Determine a confidence interval for the mean fat content of all pasties made ifthe population standard deviation is grams.

95%1 35:

Example 28

EXERCISE 7I.1



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


TI

C

5

84 53 66 61 80 75 67 74 59 56 81 68 74 6982 76 60 63 78 71 80 60 72 63 58 77 68 7263 71 67 76 54 72 64 70 70 61 82 68

A 95% confidence interval for a mean ¹ of a population was recorded as

8:5617 6 ¹ 6 9:4383. This estimate was based on a sample of size n = 60.

Use this information to calculate

a x, the sample mean

b ¾, the population standard deviation which was used to calculate the

confidence interval.

a The 95% confidence interval is x¡ 1:96¾pn< ¹ < x + 1:96

¾pn

So, x¡ 1:96¾pn

= 8:5617 and x + 1:96¾pn

= 9:4383

Adding these equations gives 2x = 8:5617 + 9:4383 = 18 and so x = 9.

b Substituting n = 60 and x = 9 into

x¡ 1:96¾pn

= 8:5617 gives 9 ¡ 1:96¾p60

+ 8:5617

) 1:96¾p60

+ 0:4383

) ¾ + 0:4383 £p

60

1:96+ 1:732

6 A 95% confidence interval for the mean ¹ of a population is based on a sample of

n = 50, and given by 3:5842 6 ¹ 6 4:4158. Find:

a x, the sample mean

b ¾, the population standard deviation which was used to calculate the confidence

interval.

7 A 95% confidence interval for the mean ¹ of a population is given by

19:685 6 ¹ 6 22:315. If the population standard deviation is ¾ = 6, what was the

sample size?

It is possible to obtain confidence intervals at any level of confidencefrom graphics calculators. Click on the icon to see how to do this onyour calculator.

Example 29

a Determine the sample mean x and standard deviation s.

b Using s to estimate ¾, determine a 95% confidence interval that the company would

use to estimate the mean point score for the population of applicants.

To work out the credit limit of a prospective credit card holder, a company gives pointsbased on factors such as employment, income, home and car ownership, and generalcredit history. A statistician working for the company randomly samples applicantsand determines the point total for each. These are:

40



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


When designing an experiment in which we wish to estimate the population mean, the size

of the sample is an important consideration. Finding the sample size is a problem that can be

solved using the confidence interval.

Let us revisit Example 28 on the fat content of pasties. The question arises:

‘How large should a sample be if we wish to be 95% confident that the sample mean will

differ from the population mean by less than 0:3 grams?’

i.e., ¡0:3 < ¹¡ x < 0:3

Now the 95% confidence interval for ¹ is: x¡ 1:96¾pn

< ¹ < x + 1:96¾pn

Hence ¡1:96¾pn

< ¹¡ x < 1:96¾pn

and we need to find n when 1:96¾pn

= 0:3 .

So,pn =

1:96¾

0:3=

1:96 £ 1:35

0:3+ 8:82 and so n + 78.

Thus, a sample of 78 pasties should be taken.

1 A researcher wishes to estimate the mean weight of adult crayfish in South Australian

waters. She knows that the population standard deviation ¾ is 250:5 grams. How large

must a sample be so that she is 95% confident that the sample mean differs from the

population mean by less than 70 grams?

2 A porridge manufacturer samples 80 packets of porridge and finds that the sample stan-

dard deviation s, of the contents’ weight is 17:8 grams. If s is used to estimate the

population standard deviation ¾, how many packets must be sampled to be 95% confi-

dent that the sample mean differs from the population mean by less than 3 grams?

DETERMINING HOW LARGE A SAMPLE SHOULD BE

Now ¡1:96¾pn

< ¹¡ x < 1:96¾pn

so we need to find n such that 1:96¾pn

= 5 i.e.,1:96 £ 16:8p

n= 5

) n =

µ1:96 £ 16:8

5

¶2

+ 43:37

A sample of 44 yabbies should be taken.

Revisit the yabbies from the dam problem of . Suppose we wish to findthe sample size needed to be confident that the sample mean differs from thepopulation mean by less than grams. What sample size should be taken?

Example 27

95%5

Example 30

EXERCISE 7I.2



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


x

n

x�

� 96.1n

x�

� 96.1

w

3 Patients from an alcohol rehabilitation program participate for various lengths of time

with a standard deviation of 4:7 days. How many patients would have to be sampled to

be 95% confident that the sample mean number of days on the program differs from the

population mean by less than 1:8 days?

Consider the typical 95% confidence interval

shown in the diagram.

The width of this interval is w = 2 £ 1:96¾pn

.

In taking a sufficiently large sample size n we can make w as small as we like.

As w = 2 £ 1:96¾pn

,pn =

2 £ 1:96¾

wand so n =

µ2 £ 1:96¾

w

¶2

When we wish to estimate the population mean from a sample of size n at a 95%confidence level, the sample size is given by

n =

µ2£ 1:96¾

w

¶2

where ¾ is the population standard deviation

and w is the confidence interval width.

In Example 30, w = 2 £ 5 and ¾ + 16:8 : Thus, n =

µ2 £ 1:96 £ 16:8

10

¶2

+ 43:37, etc.

Since n is an integer, n = 44 would give a 95% confidence interval of width about 10 grams.

4 A population is known to have standard deviation ¾ = 34. Find the sample size n that

should be taken to find a 95% confidence interval for the population mean ¹ of width:

a w = 5 b w = 1 c w = 0:1

5 A manufacturer of bottled water knows that the machine dispenses water into 1 litre

bottles with a standard deviation of 2:3 mL. The machine needs to be checked regularly

to ensure it is still delivering the correct volume. How many bottles should a quality

controller be checking to find a 95% confidence interval of width:

a 2 mL b 1 mL c 0:5 mL?

6 a If the size n of a sample is doubled, by how much will the width of a 95% confidence

interval decrease?

b How much larger do you have to make a sample size to halve the width of a 95%confidence interval?

Confidence intervals provide an estimate for the size of the population mean ¹. They can

also be used to assess claims about population means. For example:

Suppose the volume V of fruit juice dispensed by a machine is normally distributed with

mean ¹ litres which can be adjusted, and standard deviation ¾ = 0:0015 litre (112 mL, about

14 of a teaspoon) which is fixed.

USING A CONFIDENCE INTERVAL FOR A CLAIM ABOUT ¹



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


Suppose a manufacturer needs to fill cartons with 1 litre of fruit juice. To ensure that almost

all cartons contain at least 1 litre, the value of the mean ¹ is set at 1:005 litre.

Note that for sufficiently large n the null hypothesis will not be accepted at the 5% level.

For such values of n the difference is statistically significant at the 5% level even though the

difference of 0:01 mL (hardly a drop) is not significant as the word is commonly understood.

1 Suppose the time it takes Joan to run 100 metres is normally distributed with mean

¹ = 12:46 seconds and standard deviation 1 second. To improve her time Joan goes on

a training program. After the training program, Joan finds that the mean time from 12trial runs is now 11:62 seconds.

a Construct a 95% confidence interval for Joan’s mean assuming the standard deviation

has not changed.

b Use the result of part a to assess the claims:

i Joan’s time to run 100 metres has improved.

ii Joan is now better than Betty whose time for the 100 metres is 11:97 seconds.

a Construct a 95% confidence interval for the volume ¹ dispensed by the machine.

b Use the 95% confidence interval to assess the claim that the volume dispensed by

the machine has increased.

c Can we conclude that the volume of ¹ is now larger than 1005 mL?

a The confidence interval is 1003 6 ¹ 6 1011:

b Since 995 is less than all the values in the 95% confidence interval we can be

confident that the population mean has increased.

c Althouth the sample statistic 1007 mL is larger than 1005, the smallest number

in the 95% confidence interval for ¹ is 1003 mL. This means that ¹ could be as

small as 1003 mL, and there is not enough evidence to support the claim that

¹ > 1005 mL.

Suppose the volume of cool drinks dispensed into cartons by a machine isnormally distributed with mean which can be adjusted, and standard deviation

mL which is fixed. The value of is supposed to be mL, but the machineoperator notices that actually mL. The operator therefore adjusts the volumedispensed by the machine. A quality controller tests cartons and finds that theirmean volume is mL.

V¹

¹¹

10 1005=995

251007

� ��

�

Example 31

Note: This question is closely related to testing the hypotheses ,.

H ¹H ¹

0�

�

: = 1005: =1005

� � �� a 6

EXERCISE 7I.3

A quality controller takes a sample of n cartons and, with very accurate measurements, finds

that the sample mean v = 1:004 99 litres. We want to test the hypotheses H0 = 1:005,

Ha 6= 1:005 for various large values of n:



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


50.0 56.1 57.2 58.3

CI

A buyer for a restaurant chain goes to a seafood wholesaler to inspect a large catch

of 50 000 prawns. She has instructions to buy the catch only if the prawns are heavy

enough. The buyer selects a sample of 60 prawns and finds that their mean weight is

57:2 grams. It is known that the population standard deviation ¾ is 4:2 grams.

a Find the 95% confidence interval for the population mean.

b The buyer claims she is 95% confident that no more than 10% of the prawns

weigh less than 50 grams. Use the confidence interval found in part a to justify

this claim. You may assume that the weights of prawns are normally distributed.

a Using technology, the 95% confidence interval for the population mean ¹ is

56:1 6 ¹ 6 58:3 .

b The smallest value in the 95% confidence int-

erval is 56:1, and so the buyer can be 95%confident that the population mean ¹ > 56:1 .

If W is the weight of prawns, then W » N(¹, ¾2).

If we use ¹ = 56:1 and ¾ = 4:2, then using technology Pr(W < 50) = 0:0732.

Hence 7:32%, or less than 10% of the prawns weigh less than 50 grams.

OTHER APPLICATIONS OF CONFIDENCE INTERVALS

Example 32

2 A complaint was made to a call centre that it took a mean time of 12 minutes before a

caller was put through to an operator. After changes were made, the call centre claimed

that the service had improved. To check this claim, a consumer group made 40 calls to

the centre. They found the mean waiting time was 8 minutes with a standard deviation

of 3 minutes. Assuming that 40 is large enough for the Central Limit Theorem to apply,

construct a 95% confidence interval for the mean waiting time ¹. Does the confidence

interval support the call centre’s claim? (Use s to estimate ¾.)

3 The distance D a golfer can hit a ball is randomly distributed with a mean ¹ = 115metres and standard deviation ¾ = 32 metres.

a After spending time with a professional the golfer measured the distance of 30drives. The results of the drives in metres were as follows:

133 153 110 93 142 135 62 150 127 112119 171 143 92 162 128 149 73 39 84138 152 163 174 152 141 129 87 118 149

Assuming that the sample of 30 is large enough for the Central Limit Theorem to

apply, calculate a 95% confidence interval for the mean distance ¹ the golfer can

now hit the ball. Does the confidence interval provide enough evidence to support

the claim the golfer has improved?

b The golfer decided to have another trial of 50 drives. Suppose the mean of the 50trials is the same as in part a.

i

ii Does the new information provide evidence that the golfer has improved?


Explain briefly why increasing the number of trials could make a differenceto a drive length.


0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\274SA12STU-2_07.CDR Friday, 10 November 2006 12:22:49 PM PETERDELL

1 The manager of a golf club claimed that the income of most of its members was in excess

of $75 000 and thus its members could afford to pay increased annual subscriptions. To

justify this claim was not valid, the members sought the help of a statistician.

The statistician examined a random sample of 113 club members and found that the mean

income was $96 318. It is known that the standard deviation of the members’ incomes

is $14 268:

a Find the 95% confidence interval for the population mean income of all members.

b The statistician claimed that he was 95% certain that no more than 10% of the

members had a mean income of less than $75 000.

Assuming that the income of members is normally distributed, how could you justify

the statistician’s claim?

2 Fabtread manufacture motorcycle tyres. Under normal test conditions the stopping time

for motor cycles travelling at 60 km/h is 3:45 seconds with standard deviation 0:17seconds. Their production team has just designed and manufactured a new tyre tread.

They take 41 stopping time measurements with the new tyres and find the mean time is

3:03 seconds.

a Calculate a 95% confidence interval for the mean stopping time of the new tyres.

b The team claims that they are 95% certain that less than 15% of the stopping times

of their new tyres will exceed the 3:45 seconds of the old tyres.

Assuming that the stopping time is normally distributed, how could you justify the

team’s claim?

There are often good reasons to find confidence intervals other than those of 95%. In areas

like medicine, a researcher may want to have more certainty when making decisions and often

may prefer a confidence interval of 99%. In other areas where the outcomes of decisions are

not so important, people may be satisfied with 90% confidence intervals.

Your calculator can produce confidence intervals at any level.

1 The mean ¹ of a population is unknown, but its standard deviation is 10. In order to

estimate ¹ a random sample of size n = 35 was selected. The mean of the sample was

found to be 28:9.

a Find a 95% confidence interval for ¹. b Find a 99% confidence interval for ¹.

c In changing the confidence level from 95% to 99%, how does the width of the

confidence interval change?

2 If the P% confidence interval for ¹ is x¡ a

µ¾pn

¶< ¹ < x + a

µ¾pn

¶then

for P = 95, a = 1:960: Find a if P is: a 99 b 80 c 85 d 96.

3 The choice of the confidence level to be used is made by an experimenter. Why is it

that experimenters do not always choose confidence intervals of at least 99%?

EXERCISE 7I.4

EXTENSION TO CONFIDENCE INTERVALS OTHER THAN 95%

EXERCISE 7I.5



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


REVIEW SET 7A

1 The arm lengths of 18 year old females are normally distributed with mean 64 cm

and standard deviation 4 cm.

a Find the percentage of 18 year old females whose arm lengths are:

i between 60 cm and 72 cm ii greater than 60 cm.

b Find the probability that if an 18 year old female is chosen at random, she will

have an arm length in the range 56 cm to 68 cm.

2 a If Z has a standard normal distribution, find k if Pr(Z 6 k) = 0:95 .

b If X » N(23, 2:62) find k if Pr(X < k) = 0:6 .

3 In a mathematics test out of 40 marks, the mean mark was 28:3 and the standard

deviation was 4:1. The marks were all integers and the minimum pass mark was set

at 24. Assuming marks were approximately normal, what proportion of the students:

a passed the test b scored more than 20 c scored between 25 and 35?

4 The weights of apples from an orchard are known to be normally distributed with

mean ¹ = 350 grams and standard deviation ¾ = 25 grams. The apples are packed

in boxes of 50 each.

a How many apples in a box would you expect to weigh more than 375 grams,

and how many less than 325 grams?

b In 500 boxes, how many apples would you expect to have a weight between 325and 375 grams?

5 To test the hypotheses H0: ¹ = 36 and Ha: ¹ 6= 36 a random sample of n = 20was selected. The outcomes are listed below:

38 22 43 21 36 44 20 49 36 3042 43 38 28 33 22 29 25 28 34

Use this information to test the null hypothesis at the 5% level if the population

standard deviation is 10 grams.

6 The standard deviation in the weight of cereal boxes is 23:6 grams. How many boxes

must be sampled from the population to be 95% confident that the sample mean differs

from the population mean by less than 4 grams?

7 A factory canning apricots uses a machine to deliver the fruit and syrup into cans. The

quality controller randomly samples 65 cans and finds that the mean mass of contents

is 828:2 grams.

a Construct a 95% confidence interval in which the true population mean should

lie if the population standard deviation is 16:3 grams.

b What should the sample size be to construct a confidence interval of half the

width of that in a?

8 a Kerry’s marks for an English essay and a Chemistry test were 26 out of 40 and

82% respectively.

i Explain briefly why the information given is not sufficient to determine

whether Kerry’s results are better in English than in Chemistry.

REVIEWJ



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


REVIEW SET 7B

ii Suppose that the marks of all students in both the English essay and the

Chemistry test were normally distributed as N(22, 42) and N(75, 72) re-

spectively. Use this information to determine which of Kerry’s two marks

is better.

iii If there were 50 students sitting for the English essay, how many would

have scored more than Kerry?

b Les is to sit for five subjects in the final examination. Because of many different

factors that determine examination marks, the marks Les can expect in each exam

are normally distributed. Suppose that the mean ¹ and standard deviation ¾ = 2are the same for each exam.

If ¹ = 12 calculate the probability that Les will gain a total mark for the five

subjects of between 60 and 70.

c The value of the mean ¹ depends on the time t hours that Les studies. It is given

by ¹ = 16 ¡ 8=(t + 2).

i For how long must Les study to achieve a value of ¹ = 15?

ii Les’s total score for the five examinations was 65. Use this information to

test the hypotheses H0 : ¹ = 15 and Ha : ¹ 6= 15.

iii Use the total score of 65 to construct a 95% confidence interval for the mean

¹. Use this interval to estimate a range of times Les might have studied for

the examination.

1 Find the mean and standard deviation of these two samples of lengths given in cm:

A 170:1 169:4 169:5 170:4 169:8 170:5 170:0 170:0 170:3 170:8170:0 169:9 170:2 170:0 169:9 169:9 170:5 170:1 169:7 170:0

B 177 166 153 167 176 173 169 161 172 174170 162 178 174 179 171 148 184 178 175

Which of the above is a sample of heights of 15 year old boys, and which is a sample

of length of planks cut by a machine?

2 The contents of a certain brand of soft drink can is normally distributed with mean

377 mL and standard deviation 4:2 mL.

a Find the percentage of cans with contents:

i less than 368:6 mL ii between 372:8 mL and 389:6 mL

b Find the probability of randomly selecting a can with contents between 364:4mL and 381:2 mL.

3 The life of a Xenon battery is known to be normally distributed with a mean of

33:2 weeks and a standard deviation of 2:8 weeks.

a Find the probability that a randomly selected battery will last at least 35 weeks.

b For how many weeks can the manufacturer expect the batteries to last before 8%of them fail?

4 The length of steel rods produced by a machine is normally distributed with a standard

deviation of 3 mm. It is found that 2% of all rods are less than 25 mm long. Find

the mean length of rods produced by the machine.



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


5 a If Z has a standard normal distribution, find a if Pr(Z 6 a) = 0:9 .

b If X » N(15:6, 22) find a if Pr(X < a) = 0:9 .

6 A manufacturer claims that his canned soup contains 135 mg of salt. To check this

claim a consumer tested 87 cans for salt content and found that the mean was 139:6mg. It is known that the population standard deviation is 22:8 mg. At a 5% level is

there sufficient evidence to reject the manufacturer’s claim?

7 To test the null hypothesis H0: ¹ = 2000 and Ha: ¹ 6= 2000, a random sample

of n = 75 was selected and found to have mean x = 1840.

a If the population standard deviation ¾ = 690, is there sufficient evidence to reject

the null hypothesis at the 5% level?

b For what values of the sample mean ¹x would you not reject the null hypothesis at

the 5% level?

8 A telephone call centre handles many calls each day. Let T be the time in minutes

taken to answer a call.

In 2006 the mean answering time for a call was ¹ = 4:3 minutes with standard

deviation ¾ = 1:2 minutes.

Let T be the mean time taken to answer a random sample of 100 calls.

a The two histograms below show the distribution of a sample of size 50 taken

from T . Note that the horizontal scale and the bin width are the same in both

histograms, but the vertical scales are different.

Identify the histogram that represents a sample from T .

Explain your answer.

b i Assuming that n = 100 is sufficiently large, explain why the distribution

of T is approximately normal with mean 4:3 minutes and standard deviation

0:12 minutes.

ii Calculate the probability Pr(T 6 4:35).

iii Hence calculate the probability that an operator in the call centre can be

occupied in answering 100 calls for less than seven and a quarter hours.

c As well as answering routine calls, the supervisor of the call centre also han-

dles unusual cases that are too complicated for other staff to handle. When the

supervisor was timed her mean time to answer 100 calls was T = 4:6 minutes.

i Use the statistic T = 4:6 minutes to test the hypothesis H0 : ¹ = 4:3 and

Ha : ¹ 6= 4:3, at level.5%ii The supervisor is asked to explain why she is taking too long to answer

questions. What reasons can the supervisor provide to claim that the Central

Limit Theorem does not apply to her?

0

2

4

6

0 1 2 3 4 5 6 7 8

frequency

0

10

20

30

40

0 1 2 3 4 5 6 7

frequency

time (min)8

Histogram A Histogram B



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100


REVIEW SET 7C

1 Sketch the graph of X » N(3, 22).On the horizontal axis mark in the z-scores as well as their corresponding x values.

Calculate these probabilities: a Pr(¡1 6 X 6 1) b Pr(¡1 6 Z 6 1) .

2 Staplers are manufactured for $5:00 each and are sold for $20:00 each. The staplers

are guaranteed to last three years. The mean life is actually 3:42 years and the

standard deviation 0:4 years. If the life of these staplers is normally distributed, how

much profit would we expect from selling a batch of 2000 (with a maximum of one

replacement)?

3 The edible part of a batch of Coffin Bay oysters is normally distributed with mean

38:6 grams and standard deviation 6:3 grams. Given that the random variable X is

the mass of a Coffin Bay oyster, find:

a a if Pr(38:6 ¡ a 6 X 6 38:6 + a) = 0:6826 b b if Pr(X > b) = 0:8413.

4 King prawns are favourite items on the menu of Stirling Caterers. From past expe-

rience the manager knows that people on average eat 325 g of prawns with standard

deviation 86 g. The manager is to cater for a wedding of 80 guests and decides to

purchase 27:5 kg of prawns. What is the probability that the caterer will run out of

prawns?

5 For export purposes peaches must be neither too small nor too large. A grower claims

that the peaches in his orchard have a mean weight of 300 grams, just right for export.

A buyer knows that the population standard deviation is 30 grams, and he wants to

test the grower’s claim.

a What hypotheses should the buyer consider?

b Suppose the buyer selects a random sample of 100 peaches and finds that their

mean weight ¹x = 310 grams.

i What is the null distribution the buyer should use?

ii Calculate the test statistic z for this sample.

iii Does this sample support the grower’s claim at the 5% level?

6 Width (mm) Frequency

22 123 324 1725 4326 6827 4128 2429 3

a Find the sample mean.

b Determine a 95% confidence interval for the

population mean ¹.

7 Suppose the weight X of apricots is normally distributed with ¹ = 90 grams and

¾ = 10 grams.

a Calculate the proportion of apricots with weight less than 88 grams.

b In a box of 100 apricots, how many would you expect to weigh less than 88 g?

c The apricots are packaged into boxes of 100 each. What proportion of the boxes

will have apricots with a mean weight less than 88 g?

The average width of snail shells of a local speciesneeds to be estimated. It is known that the standarddeviation is . mm. Pauline takes a random sample of

snails and measures the width of each shell to thenearest mm. The results are shown in the tablealongside.

��

�

1 4200



0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\279SA12STU-2_07.cdr Wednesday, 8 November 2006 8:49:00 AM DAVID3

d On each of the boxes of 100 apricots is printed that the nett weight is 8:8 kilo-

grams. In a shipment of 500 boxes, for how many is the weight less than 8:8kilograms?

8 The time T it takes Laura to travel to work is normally distributed with mean ¹minutes and standard deviation 10 minutes. Laura’s work starts at 9 o’clock in the

morning.

a Suppose ¹ = 40 minutes and Laura leaves for work at a quarter past eight in

the morning.

i What is the probability she will be late?

ii If there are 250 working days in a year, how often would Laura be expected

to be late to work in a year?

b

ii Suppose Laura found that for her sample of 10 days the mean time to travel

to work was T 10 = 35 minutes. Use this information to test the hypotheses

H0 : ¹ = 40 and Ha : ¹ 6= 40, at level.5%

iii Calculate the 95% confidence interval for ¹.

iv How large a sample should Laura take to obtain a 95% confidence interval

of width 2:48 minutes?

c

Laura does not know the value of ¹ and decides to keep a 10 day record of the

time it takes her to go to work. Let T 10 be the distribution of the mean time

over 10 days it takes Laura to go to work.

i Briefly describe the distribution T 10 in terms of the distribution T it takes

Laura to go to work.


After keeping records for a year consisting of working days, Laura foundthat the mean travelling time to work was minutes. She wants to becertain that she will be at work before o’clock at least of the time in thefollowing year. To the nearest minute, what is the latest time you would adviseLaura to leave home? Give reasons for your answer.

25031 52

9 90%: �

��

95%


0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\280SA12STU-2_07.cdr Wednesday, 8 November 2006 8:50:22 AM DAVID3

Date post:	16-Jun-2018
Category:	Documents
Upload:	trandat
View:	218 times
Download:	1 times

Statistics - haesemathematics.com€¦ · A local paper advertised for volunteers to test the...

Documents