7
Contents:
StatisticsStatistics
A
B
C
D
E
F
G
H
I
J
Key statistical concepts
Describing data
Normal distributions
The standard normal distribution
Finding quantiles ( -values)
Investigating properties of normal
distributions
Distribution of sample means
Hypothesis testing for a mean
Confidence intervals for means
Review
k
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\219SA12STU-2_07.CDR Thursday, 2 November 2006 3:10:48 PM PETERDELL
DISCUSSION SAMPLING
Words that are commonly used in Statistics:
² Population A collection of individuals about which we want to drawconclusions.
² Census The collection of information from the whole population.
² Sample A selection of information from a subset of the population.
² Data (singular datum) Information about individuals in a population.
² Parameter A numerical quantity measuring some aspect of a population.
² Statistic A quantity calculated from data gathered from a sample.It is usually used to estimate a population parameter.
² Distribution The pattern of variation of data.
A population generally consists of a large number of individuals. Because of expense and
time factors it is often only practical to select a sample rather than use the whole population.
A random sample is a sample where every individual has the same chance of being selected.
A sampling technique is biased if it tends to systematically select members of the population
with certain properties and not select those that do not have these properties. In other words
it favours some individuals above others.
INTRODUCTION
KEY STATISTICAL CONCEPTSA
In the following scenarios, can you suggest a likely population?
Can you think of any reasons the sampling techniques might be biased?
People in the local shopping centre on Saturday morning were askedhow many computers they have in their household.
After a program likely to be watched by older people, a televisionstation asked viewers to vote on the use of hand-held phones in cars.
A local paper advertised for volunteers to test the usefulness of fish oilin a diet.
²
²
²
The word was introduced into the English language by theScottish politician ( – ). He borrowed itfrom Germany where, as he put it, it meant,
“”.
The meaning he wished to give to the word was an
“
.”
You can still recognise the word “state” in statistics.
statistics
Sir John Sinclair 1754 1835
an inquiry for the purpose of ascertaining the political
strength of a country
inquiry into the state of a country, for the purpose of
ascertaining the quantum of happiness enjoyed by its
inhabitants, and the means of future improvement
RANDOM SAMPLES
220 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\220SA12STU-2_07.CDR Thursday, 2 November 2006 3:11:49 PM PETERDELL
Many sampling techniques have been developed to avoid bias. In this book it will be assumed
that any sample is a random, unbiased sample.
Descriptive statistics are concerned with collecting, summarising and describing the
characteristics of data.
With descriptive statistics we are only concerned with the data collected and make no effort
to generalise it to any other data, such as for the population.
In inferential statistics we select a random sample and we use the information from it
to make generalisations about the population from which the sample was taken.
Recall that:
a parameter is a numerical characteristic of a population and
a statistic is a numerical characteristic of a sample.
For example, when examining the mean age of people in retirement villages throughout
Australia, the mean age found would be a parameter. If we took a random sample of 300people from the population of all retirement village persons, then the mean age would be a
statistic.
Note:P
S
arameter
opulation
ample
tatistics
a What is the population size?
b What is the sample size?
c What population parameter is of interest to the business?
d What statistic is being used to estimate the parameter?
a The population is the number of blank CDs to be purchased and its size is
50 000.
b The sample size is 600:
c The population parameter being considered is the percentage of CDs which
are defective.
d The statistic being used is the percentage of CDs which are defective in
the sample. As 1:5% of 600 = 9, the business would make the purchase if
9 or less CDs in the sample were found to be defective.
A business is considering purchasing blank CDs to make CDs of their new textbooks. It will make the purchase if no more than of the CDs are defective.Because of the expense and time factors in testing all CDs the business decidesto test a random sample of for defects. They will then use the results of this sampleto estimate the percentage of defectives for the population to be purchased.
500001 5%50000
600
�
�:
Example 1
DESCRIPTIVE AND INFERENTIAL STATISTICS
EXAMPLES OF PARAMETERS AND STATISTICS
STATISTICS (Chapter 7) 221
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\221SA12STU-2_07.CDR Thursday, 2 November 2006 3:11:54 PM PETERDELL
In this course the key application is to examine a random sample in order to make appropriate
statements or inferences about the population.
Generally speaking there are five steps to address in any inferential problem. They are:
THE PROCEDURE USED IN AN INFERENTIAL PROBLEM
Step 1: State the population we are interested in examining.
Step 2: Collect data from a random sample of sufficient size from the population.
Note: What is meant by sufficient size is covered in a later chapter.
Step 3: Examine the relevant information from the sample.
Step 4: Use the results of the sample analysis to make an inference about the
population.
Step 5: Give a measure of the reliability of the inference made.
For the CD purchase in Example 1 list the procedural steps for the inferential
problem.
Step 1: The population consists of all 50 000 CDs.
Step 2: To avoid unnecessary costs and wasting time we must first decide on the
sample size. 600 has been decided upon, so we collect 600 data values
at random. We record only whether the CD is defective or not.
Step 3: Find the percentage of defective CDs in the sample.
Step 4: The inference will be to provide an estimate of the percentage of defective
CDs for the whole population. For example, if 12 CDs are defective in
the sample our inference would be that approximately 12600 = 2% would
be defective in the population.
Step 5: The estimate from the sample is not likely to be equal to the exact
value for the population. Some indication of the possible error for the
estimate should therefore be given.
An example of such a statement as in Step 5 is:
If we had many shipments of 50 000 CDs and in each we found that 12 in a sample of
600 were defective, then in 95% of these shipments there would be between 440 and
1560 defective CDs.
This type of statement is usually condensed to:
We are 95% confident that about 440 to 1560 CDs are defective.
The main thrusts of this course are to:
² determine confidence intervals in which a certain population parameter should lie at
a particular level of confidence (commonly 90%, 95%, 99%)
² devise and use particular tests of hypotheses about population means
² determine what sample sizes should return a particular level of confidence in given
situations.
Example 2
222 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\222SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:01 PM PETERDELL
a In this city, bananas are cheaper than oranges.
If you buy a kilogram of each of the three different types of fruit from the onestore, you pay the same total amounts at stores A and D.
Of the four stores, the store with the most expensive apples also had the mostexpensive oranges and bananas.
In general, store C has themost expensive fruit.
Of the four stores, store Chas the most expensivefruit. (Careful! What is thepopulation and what is thesample?)
b
c
d
e
1 A new drug called Cobrasyl, a derivative of cobra
venom, is to be approved for the treatment of high
blood pressure in humans.
A research team treats 127 high blood pressure
patients with the drug and in 119 cases it reduces
their blood pressure to an acceptable level.
a What is the sample of interest?
b What is the population of interest?
2 In 2006, 800 computer workers throughout Australia were surveyed and asked a question.
The question was: “Is your main interest in developing software or in using already
developed software?” 83% said that developing software was their main interest.
a What is the population of interest?
b What is the parameter of interest?
c What statistic is used to estimate the parameter?
3
a
b What is the parameter of interest?
c
4 Last December Tina visited four super-
markets A, B, C and D on the same day.
She recorded the price per kilogram of
various fruits in the table opposite:
Determine whether the following state-
ments are descriptive or inferential:
Store Oranges Apples Bananas
A $2:35 $2:15 $1:70
B $2:45 $2:55 $2:00
C $2:50 $2:60 $2:10
D $2:25 $2:05 $1:90
EXERCISE 7A
What is the population the processor isinterested in?
A South Australian processor of seafood needs toestimate the average weight of a prawn in acatch. A sample of prawns was selected andfound to have an average weight of grams.
35253 8:
What statistic does the processor use toestimate the parameter?
STATISTICS (Chapter 7) 223
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\223SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:07 PM PETERDELL
This section will review the main concepts from Year 11 so that students will reacquaint
themselves with the terminology used in statistics.
A variable is a quantity that can have different values for different individuals in the
population.
Since variables are sometimes used to describe random processes, they are often called
random variables.
Variables are usually denoted by capital letters such as X. Individual values, called observa-
tions or outcomes, are denoted by lower case letters such as x.
We shall deal with two types of variables: categorical and quantitative.
A categorical or nominal variable can be described by a quality or characteristic that
is essentially non-numeric. Individuals are described by different categories.
DESCRIBING DATAB
Examples of categorical data are:
Variable Possible values
² X is the gender of a person x = male or female
² C is the type of motor car c = Holden, Ford, Toyota
² M is the membership of political party m = ALP, LIB, DEM
A quantitative or numerical variable takes numerical values.
There are essentially two different types of numerical variable.
A numerical discrete variable takes discrete number values only.
It is often a result of counting.
Examples of discrete variables are:
Variable Possible values
² X is the number of people in a household x = 1, 2, 3, 4 ::::::
² T is the mark out of 10 for a test t = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Examples of continuous variables are:
Variable Possible values
² W is the weight of newborn babies w is likely to be in the interval from 0:5 kg
to 5 kg.
² X is the amount of water in a 500 litre
rain water tank
x is any volume between 0 and 500 litres.
A can take any numerical value in an interval.A continuous variable is often a result of measuring.
numerical continuous variable
224 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\224SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:14 PM PETERDELL
In a sample of size n, the sample standard deviation, usually denoted by s, is:
s =
s(x1 ¡ x)2 + (x2 ¡ x)2 + :::::: + (xn ¡ x)2
n ¡ 1=
sP(xi ¡ x)2
n¡ 1
In a population of size n, the population standard deviation, usually denoted
by the Greek letter ¾ (sigma), is:
¾ =
s(x1 ¡ ¹)2 + (x2 ¡ ¹)2 + :::::: + (xn ¡ ¹)2
n=
sP(xi ¡ ¹)2
n
Since continuous variables take on values in intervals, they are also called interval variables.
The essential difference between a categorical and a quantitative variable is that we can do
arithmetic with quantitative variables, but not with categorical variables.
In this book we are mainly concerned with the mean and the standard deviation.
The mean of a sample of n numbers,
x1, x2, ......... , xn is: x =x1 + x2 + ::::::: + xn
n=
1
n
nPi=1
xi
The Greek letterP
(sigma) is used to denote the summation of numbers,
sonP
i=1xi = x1 + x2+ ::::::: +xn (read “the sum of all xi for i = 1 to n”).
The endpoints of the summation, i = 1 to n are sometimes omitted, so the mean can be
written as 1n
Pxi or even 1
n
Px.
The mean of a population is usually denoted by the Greek letter ¹ (mu), so ¹ =1
n
Px.
We can get a much clearer picture of a data set if, in addition to having a measure for the
centre, we also have an indication of how the data is spread.
For example, the mean weight of oranges from a particular orchard and the mean weight of
salt bagged by a machine may both be 500 grams, but the variation in the weights of oranges
is likely to be much greater than that of bags of salt. The data for oranges will therefore have
a greater spread.
The most commonly used measure of spread about the mean is the standard deviation.
The standard deviation of a sample is a little different from the standard deviation of a
population.
THE MEAN AND STANDARD DEVIATION (REVIEW)
The reason for this difference is rather technical and, at this stage we do not attempt to explain
the difference.
Statisticians know that the value of s, as calculated by the above formula, gives an unbiassed
estimate of the population standard deviation ¾.
Notice that for large n, the values of s and ¾ are virtually the same.
STATISTICS (Chapter 7) 225
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\225SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:20 PM PETERDELL
The mean and standard deviation can also be calculated from frequency tables.
The frequency fi of a quantity xi is the number of times it occurs.
For a population of size n, the formulae for the mean and standard deviation become:
¹ =f1x1 + f2x2 + f3x3 + :::::: + fkxk
n
and ¾ =
r(x1 ¡ ¹)2f1 + (x2 ¡ ¹)2f2 + :::::: + (xk ¡ ¹)2fk
n
Notice that ¹ =
µf1n
¶x1 +
µf2n
¶x2 +
µf3n
¶x3 + :::::: +
µfkn
¶xk.
fin
is the proportion of xi in the population. For large values of n, the experimental
probability pi of randomly selecting xi from the population is taken to be pi =fin
.
So, using pi =fin
, ¹ = p1x1 + p2x2 + p3x3 + :::::: + pkxk =
Xpixi :
Similarly for the population standard deviation:
¾ =
sµf1n
¶(x1 ¡ ¹)2 +
µf2n
¶(x2 ¡ ¹)2 + :::::: +
µfkn
¶(xk ¡ ¹)2
which leads to ¾ =
qXpi(xi ¡ ¹)
2.
The probability table is: xi 0 1 2 3 4 5
pi 0:00 0:23 0:38 0:21 0:13 0:05
Now ¹ =X
pixi
= 0:23 £ 1 + 0:38 £ 2 + 0:21 £ 3 + 0:13 £ 4 + 0:05 £ 5
= 2:39
i.e., in the long run, the average number purchased per customer is 2:39
Also, ¾ =qX
pi(xi ¡ ¹)2
=q
0:23 £ (1 ¡ 2:39)2 + 0:38 £ (2 ¡ 2:39)2 + :::: + 0:05 £ (5 ¡ 2:39)2
+ 1:12
A magazine store claims of its customers purchase one magazine, purchasetwo, purchase three, purchase four, and purchase five. Find the meanand the standard deviation of , the number of magazines sold to a customer.
23% 38%21% 13% 5%
X
Example 3
226 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\226SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:26 PM PETERDELL
‘Cheap Car Insurance’ insures used cars valued at $6000 under these conditions.
A $6000 will be paid to the owner for total loss
B for damage between $3000 and $5999, $3500 will be paid
C for damage between $1500 and $2999, $1000 will be paid
D for damage less than $1500, nothing will be paid.
From statistical information the insurance company knows that in any year the
probabilities of A, B, C and D are 0:03, 0:12, 0:35 and 0:50 respectively.
If the company wishes to receive $80 more than its expected payout on each
policy, what should it charge for the policy?
Let X be the random variable of payouts, so the probability table is:
xi 0 1000 3500 6000
pi 0:50 0:35 0:12 0:03
The expected payout is the mean, ¹, and
¹ =P
pixi
= (0:50) £ 0 + (0:35) £ 1000 + (0:12) £ 3500 + (0:03) £ 6000
= 950
The company expects to pay out $950 on average in the long run, so it should
charge $950 + $80 = $1030:
1
xi 0 1 2 3 4 5 > 5
P (xi) 0:54 0:26 0:15 0:03 0:01 0:01 0:00
a What is the mean number of deaths per dozen crayfish?
b Find ¾, the standard deviation for the probability distribution.
2
Example 4
EXERCISE 7B
Australian crayfish is exported to Asian markets. Thebuyers are prepared to pay high prices when the crayfisharrive still alive. If is the number of deaths per dozencrayfish, the probability function for is given by:
XX
A random variable X has probability function given by
P (x) = k(0:4)x(0:6)3¡x for x = 0, 1, 2, 3.
a Find P (x) for x = 0, 1, 2 and 3 and hence find k.
b Find the mean and standard deviation for the distribution.
3 An insurance policy covers a $20 000 sapphire ring against theft and loss. If it is stolen
the insurance company will pay the policy owner in full. If it is lost they will pay the
owner $8000. From past experience the insurance company knows that the probability
of theft is 0:0025 and of being lost is 0:03. How much should the company charge to
cover the ring if they want a $100 expected return?
STATISTICS (Chapter 7) 227
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\227SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:33 PM PETERDELL
NORMAL DISTRIBUTIONSC
DISCUSSION THE EFFECT OF RANDOM FACTORS
4 Use technology to find the mean and standard deviation of the two samples, A and B,
of weights given in grams.
A 498:8 500:2 500:4 499:9 500:4 500:6 498:9 498:2 500:1 501:9500:8 498:6 499:7 498:6 499:0 498:8 499:1 500:7 500:7 501:3501:1 501:5 499:0 499:7 498:4 501:1 500:1 499:9 500:9 499:2
B 545:5 543:4 399:8 511:3 616:3 496:7 337:8 650:2 426:3 522:2664:0 415:1 416:0 425:4 419:9 503:7 427:8 474:2 459:9 390:5428:5 451:9 590:1 613:5 402:3 318:3 478:1 502:2 626:4 435:7
Which of the samples is the weights of bags of salt, and which is the weights of oranges?
5 Test marks out of 10 are recorded in the following frequency table:
Mark 0 1 2 3 4 5 6 7 8 9 10
Frequency 2 1 0 4 5 8 12 15 7 3 5
a Find the mean and standard deviation of these scores.
b Calculate the percentage difference between using the formulae for population
standard deviation and sample standard deviation.
6 Using ¾2 =P
pi(xi ¡ ¹)2 show that ¾2 =P
pix2i ¡ ¹2:
(Hint: ¾2 =P
pi(xi ¡ ¹)2 = p1(x1 ¡ ¹)2 + p2(x2 ¡ ¹)2 + :::::: + pn(xn ¡ ¹)2:
Expand ¾2 and regroup the terms.)
Many quantities reflect the combined effect of a large number of random factors.
For example:
²
²
² Consider at least three factors that affect each of the following:
a the weight of a newly born piglet
b the time to complete an assignment
c the mark achieved in an examination
d the number of goals scored in a netball match.
² For each of the above random variables, suggest why the distribution might be
a symmetric b bell shaped.
The next investigation explores the distribution of a quantity that is the combined result of
different factors.
The yield of a wheat plant is the combined result of many unpredictable factors suchas genes, rainfall, sunshine, and its position in the field where it was seeded.
The weight of a packet of sultanas is the sum of the weights of each individualsultana, and it is unlikely a packet labelled as kg will weigh exactly kg.1 1
228 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\228SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:41 PM PETERDELL
INVESTIGATION 1 SOME PROPERTIES OF A NORMAL DISTRIBUTION
Stage What is happening Time
1 Cross the road in front of the school up to 1 minute
2 Walk to the shopping centre 5 § 2 minutes
3 Walk through the shopping centre 3 § 2 minutes
4 Cross a road up to 1 minute
5 Buy a loaf of bread up to 2 minutes
6 Talk with a friend up to 2 minutes
7 Walk the remaining distance home 2 § 1 minutes
Question: According to the table, what is the longest time it may take Les to walk
home? What is the shortest time?
If Les wanted to study the distribution of the time it takes to walk home, he could keep a
daily record, but the amount of data collected would be very small.
Les could also use the information given in the table and use a spreadsheet or a calculator
to simulate the time it takes to walk home.
The following instructions are set up for a spreadsheet, but the procedure will also work
on a calculator.
1 Open the spreadsheet “Normal distribution”.
A spreadsheet with the following headings will appear.
2 In each of the cells A2 to G2, under the headings ‘Stage 1’ to ‘Stage 7’, type in the
formulae shown in the table. Do not forget to start each formula with an = sign.
Note: rand() calculates a random number between 0 and 1.
Question: What does 5 + (4*rand( ) ¡ 2) calculate?
3 In cell N2, below the heading ‘Total time’, type in the formula =sum(A2:M2)
Question: What does this formula calculate?
4 Drag the formulae in cells A2 to N2 down to fill all cells A251 to N251. Pressing
the F9 function key will produce another random sample.
Consider the time it takes Les to walk home from school. We have brokenthis into the following stages with the time it takes to complete each stage:
What to do:SPREADSHEET
The numbers in cell P2 under the heading ‘Mean’, and in cell Q2 under the heading
‘Standard Deviation’, are the mean and standard deviation of the numbers in cells N2to N251.
The number in cell R2 under the heading ‘No. within 1 st. dev.’ gives the number of
STATISTICS (Chapter 7) 229
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\229SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:47 PM PETERDELL
values within 1 standard deviation of the mean. For example, if the mean x = 12:96and the standard deviation s = 1:82, then this cell gives the number of values that
lie between x ¡ s = 11:14 and x + s = 14:78 . Similarly, the numbers in cells
S2 and T2 give the number of values within 2 and 3 standard deviations of the mean
respectively.
If you are having difficulty setting up this spreadsheet, click on the tag ‘Normal 2’ to
open a finished version.
5 Calculate the proportion of data values within each interval. For example, if there are
169 values within 1 standard deviation of the mean, the proportion of values in the
interval = 169250 = 0:676 .
6 Copy and fill in the following table for 5 different samples. The entries of the first
line may not agree with your values.
Sample Mean Stdev x¡ s to x + s x¡ 2s to x + 2s x¡ 3s to x + 3sno. x s Count Propn. Count Propn. Count Propn.
1 12:96 1:82 169 0:676
2
3
4
5
What do you notice about the proportions of data in each of the intervals?
In the following we change the value of the factors and then add more factors.
7 Change the formulae in cells A2 to G2 as shown in the table.
8 Repeat steps 4 to 6.
9 Add the following formulae in cells H2 to M2:
10 Repeat steps 4 to 6.
The graph that appears is thehistogram of data in cells N to N .2 251
230 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\230SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:52 PM PETERDELL
From Investigation 1 you should have discovered that changing the number and values of
factors may change the mean and standard deviation, but leaves the following unchanged:
Note:
A smooth curve drawn through
the midpoints of each column
of the histogram would ideally
look like the graph displayed.
Note the points of inflection at
¹¡ ¾ and ¹ + ¾.
The above information is typical of a family of normal distributions. Curves with this shape
are known as normal curves. Because of their characteristic shape, they are also called
bell-shaped curves.
Variables which are the combined result of many random factors are often approximately
normal.
The normal variable X with mean ¹ and standard deviation ¾ is denoted by X » N(¹, ¾2).
34% 34%
13.5% 13.5%
2.35%0.15% 0.15%2.35%
���� ���� ��� ��� ���������
¹¡¹ ¾ +¹ ¾
concave
convex convex
point of inflection point of inflection
² The shape of the histogram is symmetric about the mean.
² Approximately 68% of the data lies between 1 standard deviation below the mean
and 1 standard deviation above the mean.
² Approximately 95% of the data lies between 2 standard deviations below and 2standard deviations above the mean.
² Approximately 99.7% of the data lies between 3 standard deviations below and 3standard deviations above the mean.
It is a rare event for an outcome to be outside the standard deviation range betweenand . In a sample of , you would only expect about cases.¡3 3 1000 3¾ ¾
For any distribution of data, whether it is a normal distribution or not, the function whose
smooth curve approximates the histogram of the data is called a probability density function
or pdf.
If the variable X is normally distributed, N(¹, ¾2), the probability density function is
f(x) =1
¾p
2¼e¡
12 (
x¡¹¾
)2 .
CONTINUOUS PROBABILITY DENSITY FUNCTIONS
STATISTICS (Chapter 7) 231
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\231SA12STU-2_07.CDR Thursday, 2 November 2006 3:12:59 PM PETERDELL
Probability density functions f have the following properties:
² f(x) > 0 for all values of x.
² The area between the graph of f and the horizontal axis is 1, since the total of all
probabilities is 1.
² The proportion of outcomes of the variable X between the values a and b is the
area between the graph of f and the horizontal axis for a 6 x 6 b.
Notice that: Pr(a 6 X 6 b) =
Z b
a
f(x) dx
For a continuous variable X, the probability X is exactly equal to a point a is zero.
For example, the probability an egg will weigh exactly 72:9 g is zero.
If you were to weigh an egg on scales that weigh to the nearest 0:1 g, a weight of 72:9 g
means the weight lies somewhere between 72:85 and 72:95 grams.
Presumably an egg has to weigh something, and it could be 72:9 grams, but you will never
know. No matter how accurate your scales are, you can only ever know the weight of an egg
within a range.
So, for a continuous variable we can only talk about the probability an event lies in an
interval.
Notice that:
if X is continuous, Pr(a 6 X 6 b), Pr(a < X 6 b), Pr(a 6 X < b)and Pr(a < X < b) all have the same value. Why?
This would not be correct if X was discrete.
87 95 103 111��� ���
���
�����
34% 34%
13.5%
��� �����
The chest measurements of 18 year old male footballers are normally distributed with
a mean of 95 cm and a standard deviation of 8 cm.
a Find the percentage of randomly chosen footballers with chest measurements
between: i 87 cm and 103 cm ii 103 cm and 111 cm
b Find the probability of randomly choosing a footballer with a chest measurement
between 87 cm and 111 cm.
a i We need the percentage between
¹¡ ¾ and ¹ + ¾. This is 68%.
ii We need the percentage between
¹ + ¾ and ¹ + 2¾. This is 13:5%:
b The percentage between ¹¡ ¾ and
¹ + 2¾ is 68% + 13:5% = 81:5%:
So the probability is 0:815
For the distribution of chest measurements, the meancm and the standard deviation cm.¹ ¾� � � � � �=95 =8
Example 5
232 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\232SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:05 PM PETERDELL
1 What is the probability that a normally distributed value lies between:
a 1¾ below the mean and 1¾ above the mean
b the mean and the value 1¾ above the mean
c the mean and the value 2¾ below the mean
d the mean and the value 3¾ above the mean?
2 Suppose the heights of 16 year old male students are normally distributed with a mean
of 170 cm and a standard deviation of 8 cm. Find the percentage of male students whose
height is:
a between 162 cm and 170 cm b between 170 cm and 186 cm.
Find the probability that a student from this group has a height:
c between 178 cm and 186 cm d less than 162 cm
e less than 154 cm f greater than 162 cm.
3 The time T minutes it takes Charlotte to go to work is normally distributed with mean
50 minutes and standard deviation of 5 minutes. Every morning Charlotte leaves for
work at 8 am.
a If work starts at 9 am, what is the probability Charlotte will be late for work?
b If Charlotte works 250 days a year, how many times can she expect to be late?
4 Explain why each of the following variables might be normally distributed:
a the chest size of 18 year old Australian males
b the length of adult female sharks
c the protein content of each kilogram of corn grown in the same field.
5 A farmer has a flock of 237 crossbred lambs. The mean weight of the flock is 35 kg
with a standard deviation of 2 kg.
a Explain why the weights of the lambs might be normally distributed.
b If lambs between the weights of 33 to 39 kg are suitable for export, how many
lambs in this flock could the farmer expect to be able to export?
6 The weights of hens’ eggs are normally distributed with mean 65 grams and standard
deviation 6 grams.
a Determine the probability that a randomly selected egg has weight
i greater than 53 g ii less than 71 g iii between 59 g and 77 g.
b In one week the hens lay 1286 eggs. How many of these eggs are expected to be
i greater than 53 g ii less than 71 g iii between 59 g and 77 g.
7 The marks for a geography examination are normally distributed with mean 65 and
standard deviation 11.
a A geography student is chosen at random. Determine the probability that the student
scored i less than 76 marks ii between 43 and 76 marks.
b
c If 2582 students sit for the examination, how many of them would be expected to
score less than 32 marks?
EXERCISE 7C
If the top of students receive an A grade, what was the minimum markfor an A?
16%
STATISTICS (Chapter 7) 233
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\233SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:11 PM PETERDELL
For each value of ¹ and ¾ there is a different normal distribution N(¹, ¾2).
As illustrated by Investigation 1, all normal distributions have one important property in
common: the probability of an event occurring depends only on the number of standard
deviations the event is from the mean.
If x is an observation from a normal distribution with mean ¹ and standard deviation ¾,
the z-score of x is the number of standard deviations x is from the mean.
The diagram shows how
the z-score is related to
a normal curve.
������� �������� � ����� x x x
THE STANDARD NORMAL DISTRIBUTIOND
34% 34%
13.5% 13.5%
2.35%0.15% 0.15%2.35%
Normal distribution curve
���� ���� ��� ��� ���������
�� �� � � � �
actual score
z-score
8 The weights of Jason’s oranges are normally distributed. 84% of the crop weigh more
than 152 grams and 16% weigh more than 200 grams.
a Find ¹ and ¾ for the crop
b What proportion of the oranges weigh between 152 grams and 224 grams?
9 The heights of 13 year old boys are normally distributed. 97:5% of them are above 131cm and 2:5% are above 179 cm.
a Find ¹ and ¾ for the height distribution
b A 13-year old boy is randomly chosen. What is the probability that his height lies
between 143 cm and 191 cm?
10 Using the same set of axes, quickly sketch the graphs of the density functions for each
of the following distributions:
a N(0, 32) b N(0, (0:5)2) c N(¡5, 12) d N(3, 0:25).
11 Each of the following is a graph of a normal distribution with different vertical scales:
A B C
a Write down the mean ¹ for each of these distributions.
b Which of the distributions has standard deviation
i ¾ = 0:1 ii ¾ = 1 iii ¾ = 10 ?
c Which of the distributions has the largest spread?
234 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\234SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:17 PM PETERDELL
z-scores are particularly useful when comparing two measurements made using different ¹and ¾. But be careful! These comparisons will only be reasonable if both measurements are
approximately normal.
a i Sketch the graphs of the two distributions using the same scale for the
z-scores from ¡3 to +3.
ii Put the actual times/distances below each of the z-scores on the graphs.
iii Calculate the z-scores for John and Anne, and mark these on the graphs.
iv Shade the area under the respective graphs to represent performances that
were better than those of John and Anne.
b Of all the students who participated in these two events, what proportion would
have performed better than i John ii Anne?
c If 1000 students had participated in each of these two events, how many would
have performed better than i John ii Anne?
d Of the father and daughter, who had the better result?
a i/ii/iv
iii John’s time was 3:2 ¡ 3:4 = ¡0:2 minutes from the mean.
Since the standard deviation is 0:2 minutes, John ran the 800 metres in a
time of 1 standard deviation less than the mean.
The z-score of John’s performance is ¡1:
The distance Anne jumped was 5:1 ¡ 4:3 = 0:8 m above the mean.
Since the standard deviation is 0:4 metres, Anne jumped a distance of 2standard deviations above the mean.
The z-score of Anne’s performance is +2.
Example 6
The local school has kept records of all its athletics competitions. It was found thatthe time, in minutes, to run the men’s metres was normally distributed asN , . The women’s long jump, in metres, was normally distributed asN , . In John won the metre race with a time of minutes. In
his daughter Anne came second in the long jump with a distance of m.
800(3 4 (0 2) )(4 3 (0 4) ) 1980 800 3 2
2006 5 1
: :: : :
:
2
2
34% 34%
13.5% 13.5%
2.35%0.15% 0.15%2.35%
John’s time
�� �� � � � �actual time (min)
z-score�� �� �� � �� �� �
� �� �� � � � ��
34% 34%
13.5% 13.5%
2.35%0.15% 0.15%2.35%
Anne’s distance
�� �� � � � �
actual distance (m)
z-score
better than John
better than Anne
STATISTICS (Chapter 7) 235
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\235SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:23 PM PETERDELL
b i The proportion less than ¹¡ ¾ is 0:16, so 16% of all participants
performed better than John.
ii The proportion greater than ¹ + 2¾ is 0:025, so only 2:5% of all
participants performed better than Anne.
c i Of 1000 participants, 16% of 1000 = 160 were better than John.
ii 2:5% of 1000 = 25 were better than Anne; one of these happened
to be competing on the same day as Anne.
d Anne’s long jump was more outstanding than her father’s 800 metre race.
1 In a year 12 class, the marks for a Geography test marked out of 50 were normally
distributed with mean of 34 and standard deviation of 6. The marks for an English essay
out of 20 were normally distributed with a mean of 12 and standard deviation of 1:5 .
Val received a mark of 40 for her Geography and 15 for her English essay.
a Sketch the graphs of the two distributions below one another using the same scale
for the z-scores from ¡3 to +3.
Put the actual marks below each z-score on the graph.
b For which of the two subjects did Val receive the higher % mark?
c Calculate the z-score for each of Val’s results.
i Mark these z-scores on the two graphs.
ii Shade the region on the two graphs of scores which were better than Val’s.
d What proportion of the students performed better than Val in Geography, and what
proportion performed better than Val in English?
e If there were 32 students in the class, how many performed better than Val in
Geography and how many in English?
f In which of these two assessments did Val perform better?
2 Suppose that the weight W of bags of sugar filled by a machine are normally distributed
with mean ¹ = 504 grams and standard deviation ¾ = 2 grams.
A quality controller rejects any bags of sugar with weight less than 500 grams.
Across town, the weight A of bags of apples filled by an assistant in a green grocer shop
is normally distributed with mean weight 5 kilograms and standard deviation 500 grams.
Bags weighing less than 412 kg are rejected by a quality controller.
a Sketch the graphs of the two distributions below one another using the same scale
for the z-scores from ¡3 to +3.
Put the actual weights below each z-score on the graph.
b Calculate the z-score for each of the two quality controls, and shade in the regions
corresponding to the weights of bags that are rejected.
c Which of the two quality controllers is the more stringent, i.e., rejects the larger
proportion of bags?
EXERCISE 7D.1
236 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\236SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:29 PM PETERDELL
b Hua’s mark is ¡1:5 standard
deviations from the mean.
Since the standard deviation is
12, this is 12 £ (¡1:5) = ¡18marks from the mean.
Since the mean is 63, Hua’s
mark is 63 + (¡18) = 45.
3 Suppose the distribution of the diameter (in cm) of oranges from a tree is N(10, 22).
a Sketch a graph of the distribution that displays both the actual diameters as well as
the z-score along the horizontal axis.
b Find the z-score for each of the following diameters:
i 12 cm ii 9 cm iii 13 cm
c Oranges are to be dumped if their diameters have a z-score of less than ¡2.
What is the diameter of oranges that are to be dumped?
d If there are 120 oranges on the tree, how many will be dumped?
4 The volume of milk cartons filled by a machine is normally distributed with mean 504mL and standard deviation of 1:5 mL.
a What is the z-score of a carton containing 506 mL of milk?
b What is the volume of milk in a carton with a z-score of ¡1:5?
Hua’s mark
��
��
��
��
�
�
�
��
��
�
��
�
��actual mark
z-score
If x is an observation from a normal distribution with mean ¹ and standard deviation ¾, the
z-score of x can be calculated from the formula z =x¡ ¹
¾.
If the variable X is normally distributed with mean ¹ and standard deviation ¾, then
Z =X ¡ ¹
¾is called the standard normal distribution.
The variable Z is the number of standard deviations X is from the mean.
Notice that, if x = ¹ then z = 0 and if x = ¹ + ¾ then z = 1.
Suppose examination scores are normally distributed with mean mark ¹ = 63 and
standard deviation of ¾ = 12 marks.
a What is the z-score for a mark of 80?
b If Hua’s z-score is ¡1:5, what is Hua’s actual score?
a A mark of 80 is 80 ¡ 63 = 17above the mean.
Since the standard deviation
is 12, this is 1712 = 1:42 standard
deviations above the mean.
So, the z-score is 1:42
score of 80
��
��
��
��
�
�
�
��
��
�
��
�
��actual mark
z-score
Example 7
Hence, the mean of is and the standard deviation of is .Z Z0 1
STATISTICS (Chapter 7) 237
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\237SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:35 PM PETERDELL
When working with normal distributions, you are advised to sketch a graph of the normal
distribution and shade in the areas of interest.
Use technology to illustrate and calculate:
a Pr(¡0:41 6 Z 6 0:67) b Pr(Z 6 1:5) c Pr(Z > 0:84)
a For a TI, Pr(a 6 Z 6 b)
can be calculated using normalcdf(a, b, 0, 1)
Pr(¡0:41 6 Z 6 0:67)
= normalcdf (¡0:41, 0:67, 0, 1)
+ 0:408
USING TECHNOLOGY TO FIND PROBABILITIES
TI
C
Example 9
�0.41
0
0.67
The probability Z lies between ¡2 and 1 is the proportion of observations that lie
between 2 standard deviations to the left of the mean and 1 standard deviation to
the right of the mean. This is about 0:815 .
1 Subject Emma’s score ¹ ¾
English 12 10 1:1
Chinese 27 20 3:0
Geography 84 55 18
Biology 34 25 10
Mathematics 84 50 15
a Find the z-score for each of
Emma’s subjects.
b Arrange Emma’s subjects from
‘best’ to ‘worst’ in terms of the z-scores.
2 Calculate the following probabilities. In each case sketch the graph of the Z-distribution
shading in the region of interest.
a Pr(¡1 < Z < 1) b Pr(¡1 < Z < 3) c Pr(¡1 < Z < 0)
d Pr(Z < 2) e Pr(¡1 < Z) f Pr(Z > 1)
EXERCISE 7D.2
34% 34%
13.5%
�� �� � � � �
z
Find the probability that the standard normal distribution Z lies between ¡2 and 1.
The graph of the Z-distribution is shown:
Example 8
The table shows Emma’s midyear examresults. The exam results for each subject arenormally distributed with mean andstandard deviation shown in the table.
¹¾
So far we have only used integer -scores to calculate probabilities. Byrefining the methods used in we can calculate probabilities forother -scores. To see how to use your calculator to do this, click on the icon.
z
zInvestigation 1
238 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\238SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:41 PM PETERDELL
b Pr(Z 6 1:5)
= normalcdf(¡E99, 1:5, 0, 1)
+ 0:933
Note: ¡E99 is the largest negative
number on a calculator.
c Pr(Z > 0:84)
= normalcdf(0:84, E99, 0, 1)
+ 0:200
Note: E99 is the largest positive
number on a calculator.
1 If Z is the standard normal distribution, find the following probabilities.
In each case sketch the regions.
a Pr(¡0:86 6 Z 6 0:32) b Pr(¡2:3 6 Z 6 1:5) c Pr(Z 6 1:2)
d Pr(Z 6 ¡0:53) e Pr(Z > 1:3) f Pr(Z > ¡1:4)
g Pr(Z > 4)
With modern technology we can calculate probabilities for normal
distributions which have not been standardised. Click on the icon to
see how this is done.
1.50
0.840
EXERCISE 7D.3
TI
C
If X is N(10, 2:32), find these probabilities:
a Pr(8 6 X 6 11) b Pr(X 6 12) c Pr(X > 9). Illustrate.
a Pr(8 6 X 6 11)
= normalcdf(8, 11, 10, 2:3)
+ 0:476
b Pr(X 6 12)
= normalcdf(¡E99, 12, 10, 2:3)
+ 0:808
c Pr(X > 9)
= normalcdf(9, E99, 10, 2:3)
+ 0:668
1210
109
108 11
Example 10
Note:
When ¹ = 0 and ¾ = 1 we can simply use normalcdf (a, b)
STATISTICS (Chapter 7) 239
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\239SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:46 PM PETERDELL
2 If the random variable X is N(70, 32), find these probabilities:
a Pr(60:6 < X 6 68:4) b Pr(X > 74) c Pr(X 6 68)
3 Suppose the variable X is normally distributed with mean ¹ = 58:3 and standard
deviation ¾ = 8:96 .
a Let the z-score of x = 50:6 be z1 and the z-score of x = 68:9 be z2.
i Calculate z1 and z2. ii Find Pr(z1 6 Z 6 z2)
b Find Pr(50:6 6 X 6 68:9) directly from your calculator.
c Compare the answers to a and b.
4 Suppose X is N(50, 52). Calculate Pr(a < X 6 51) for each of the following values
of a. Give your answers to 5 decimal places.
a a = 45 b a = 35 c a = 25 d a = 15 e a = 0
Compare the answers of a to e with Pr(X 6 51):
5 The height of 18 year old men is normally distributed with mean 182:3 cm and standard
deviation 9:6 cm. Find the probability that a randomly selected 18 year old man is:
a at least 180 cm tall b at most 190 cm tall c between 175 and 185 cm.
6 The weight of hens’ eggs is normally distributed with mean 42:3 g and standard deviation
5:9 g. Find the probability that a randomly selected egg is:
a at most 50 g b at least 45 g c between 35 g and 45 g.
7 The speed of cars passing the supermarket is normally distributed with mean 56:3 kmph
and standard deviation 7:4 kmph. Find the probability that a randomly selected car is
travelling at:
a between 60 and 75 kmph b at most 70 kmph c at least 60 kmph.
8 The lengths of metal bolts produced by a machine are found to be normally distributed
with a mean of 19:8 cm and a standard deviation of 0:3 cm. Find the probability that a
bolt selected at random from the machine will have a length between 19:7 and 20 cm.
9 The IQs of secondary school students from a particular area are believed to be normally
distributed with a mean of 103 and a standard deviation of 15:1. Find the probability
that a student will have an IQ:
a of at least 115 b that is less than 75 c between 95 and 105:
a player was: a at least 175 cm tall b between 170 cm and 190 cm.
If X is the height of a player then X is normally distributed with mean ¹ = 179and standard deviation ¾ = 7:
a We need to find
Pr(X > 175)
= normalcdf(175, E99, 179, 7)
+ 0:716
b We need to find
Pr(170 6 X 6 190)
= normalcdf(170, 190, 179, 7)
+ 0:843
In the heights of SANFL players was found to be normally distributed withmean cm and standard deviation cm. Find the probability that in
1972179 7 1972
Example 11
240 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\240SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:52 PM PETERDELL
10 The average weekly earnings of the students at a local high school are found to be
approximately normally distributed with a mean of $40 and a standard deviation of $6:What proportion of students would you expect to earn:
a b
11 The lengths of Murray Cod caught in the River Murray are found to be normally
distributed with a mean of 41 cm and a standard deviation of 3:317 cm.
a Find the probability that a cod is at least 50 cm.
b What proportion of cod measure between 40 cm and 50 cm?
c In a sample of 200 cod, how many of them would you expect to be at least 45 cm?
Let X be the random variable of the length in mm of a snail shell.
Suppose that X is normally distributed with mean ¹ = 23:6and standard deviation ¾ = 3:1 mm. A snail farmer wants to
harvest some of his snails, but only those whose shell lengths
are amongst the longest 5%. The problem is to find k such that
Pr(X < k) = 95%.
When finding quantiles we are given a probability and are asked to calculate the corresponding
measurement. This is the inverse of finding probabilities, and we use the inverse normal
function.
Click on the icon to obtain instructions for using your calculator.
For the above example, the TI instruction is
k = invNorm(0:95, 23:6, 3:1) = 28:7
The instruction k = invNorm(0:95) will
assume that the mean ¹ = 0, and the
standard deviation ¾ = 1.
FINDING QUANTILES ( -VALUES)kE
TI
C
If Z has a standard normal distribution, find k if Pr(Z < k) = 0:73
Using a TI,
k = invNorm(0:73, 0, 1)+ 0:613
This means 73% of the values are expected to be less than 0:613
k�����
73%
����
Example 12
The number is known as a , and in this case the quantile.k quantile 95%
k��������.�����
95%
X
between $ and $ per week30 50 at least $ per week?50
Z
STATISTICS (Chapter 7) 241
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\241SA12STU-2_07.CDR Thursday, 2 November 2006 3:13:59 PM PETERDELL
Let X denote the final examination result, so X » N(62, 132):
Pr(X > k) = 0:8
) Pr(X 6 k) = 0:2
) k = invNorm(0:2, 62, 13)
) k + 51:059
So, the minimum pass mark is 51.
A university professor determines that of this year’s History candidates shouldpass the final examination. The examination results are expected to be normallydistributed with mean and standard deviation . Find the lowest score necessaryto pass the examination.
80%
62 13
Example 13
1 Z has a standard normal distribution. Illustrate with a sketch and find k if:
a Pr(Z 6 k) = 0:81 b Pr(Z 6 k) = 0:58
2 X » N(20, 32). Illustrate with a sketch and find k if:
a Pr(X 6 k) = 0:348 b Pr(X 6 k) = 0:878
c Pr(Z 6 k) = 0:17
c Pr(X 6 k) = 0:5
3 a Show that Pr(¡k 6 Z 6 k) = 2Pr(Z 6 k) ¡ 1:
b If Z is standard normally distributed, find k if:
i Pr(¡k 6 Z 6 k) = 0:238 ii Pr(¡k 6 Z 6 k) = 0:7004
4 The length of a fish species is normally
distributed with mean 35 cm and standard
deviation 8 cm. The fisheries department
has decided that the smallest 10% of the
fish are not to be harvested. What is size
of the smallest fish that can be harvested?
5 The length of screws produced by a machine is normally distributed with mean 75 mm
and standard deviation 0:1 mm. If a screw is too long it is automatically rejected. If 1%of screws are rejected, what is the length of the smallest screw to be rejected?
6 The average score for a Physics test was 46 and the standard deviation of the scores was
15. Assuming that the scores were normally distributed, the teacher decided to award
an A to the top 7% of the students in the class. What is the lowest score that a student
needed in order to achieve an A?
7 The volume of cool drink in a bottle filled by a machine is normally distributed with
mean 503 mL and standard deviation 0:5 mL. 1% of the bottles are rejected because they
are underfilled, and 2% are rejected because they are overfilled; otherwise they are kept
for retail. What range of volumes is in the bottles that are kept?
EXERCISE 7E
k ��
20%
X
We need to findsuch that
k
242 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\242SA12STU-2_07.CDR Thursday, 2 November 2006 3:14:05 PM PETERDELL
58.2 �����������
15%
0.150.1
!z=20 � !x=29#z #x
Note: Z-scores are essential for finding unknown values of ¹ and/or ¾.
8 The arrival times of buses at a depot is normally distributed with standard deviation of
5 minutes. If 10% of the buses arrive before 3:45 pm, what is the mean arrival time of
buses at the depot?
9 The IQ of a population has a standard deviation of 15. In a school 20% of students have
an IQ larger than 125. What is the mean IQ of students in this school?
10 The distance an athlete can jump is normally distributed with mean 5:2 m. If 20% of
the jumps by this athlete are less than 5 m, what is the standard deviation?
11 The weekly income of a greengrocer is normally distributed with a mean of $6100. If
85% of the time the weekly income exceeds $6000, what is the standard deviation?
Find the mean and standard deviation of a normally distributed random variable Xif Pr(X 6 20) = 0:1 and Pr(X > 29) = 0:15
X » N(¹, ¾2) where we have to
find ¹ and ¾.
We start by finding z1 and z2 which
correspond to x1 = 20 and x2 = 29.
Now z1 =20 ¡ ¹
¾= invNorm(0:1) = ¡1:282 ) 20 ¡ ¹ = ¡1:282¾ .... (1)
and z2 =29 ¡ ¹
¾= invNorm(0:85) = 1:036 ) 29 ¡ ¹ = 1:036¾ ....... (2)
Solving these two equations gives ¹ + 25:0 and ¾ = 3:88
Let the mean weight of the population be ¹ g.
If X g denotes the weight of an adult scallop,
then X » N(¹, 5:92):
As we do not know ¹ we cannot use the
invNorm directly, but we can find the z-value.
Now Pr(X 6 58:2) = 0:15
) Pr(Z 658:2 ¡ ¹
5:9) = 0:15
)58:2 ¡ ¹
5:9= invNorm(0:15) = ¡1:0364
) 58:2 ¡ ¹ + ¡6:1
¹ + 64:3 So, the mean weight is 64:3 g.
An adult scallop population is known to have a standard deviation of g. Ifof scallops weigh less than g, find the mean weight of the population.
5 9 15%58 2
::
Example 14
Example 15
STATISTICS (Chapter 7) 243
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\243SA12STU-2_07.CDR Thursday, 2 November 2006 3:14:10 PM PETERDELL
INVESTIGATION 2 THE GEOMETRIC SIGNIFICANCE OF AND¹ ¾
12 a Find the mean and the standard deviation of a normally distributed random variable
X, if Pr(X > 80) = 0:1 and Pr(X 6 30) = 0:15:
b In a Mathematics examination it was found that 10% of the students scored at least
80, and no more than 15% scored under 30. Assuming the scores are normally
distributed, what proportion of students scored more than 50?
13 The diameters of pistons manufactured by a company are normally distributed. Only
those pistons whose diameters lie between 3:994 and 4:006 cm are acceptable.
a Find the mean and the standard deviation of the distribution if 4% of the pistons
are rejected as being too small, and 5% are rejected as being too large.
b
In the previous section a number of assertions were made about the standard deviation. In
this section some of these assertions will be justified.
1 The normal probability density function is f(x) =1
¾p
2¼e¡
12 (
x¡¹¾
)2 .
Use technology to graph this function for a ¹ = 6, ¾ = 1 b ¹ = 6, ¾ = 2.
2 Show that the derivative of f(x) is f 0(x) = ¡x ¡ ¹
¾2f(x).
3 Use the result in 2 to show that f (x) has a maximum value at x = ¹.
4 Show that f 00(x) = ¡ 1
¾4(¾2 ¡ (x ¡ ¹)2) f(x) .
5 Use the result of 4 to find the points of inflection of f(x).
From Investigation 2 you
should have discovered that
the points of inflection occur
at x = ¹+¾ and x = ¹¡¾.
Consequently:
For a given normal curve the standard deviation is uniquely determined as the
horizontal distance from the vertical line x = ¹ to a point of inflection.
INVESTIGATING PROPERTIES
OF NORMAL DISTRIBUTIONSF
What to do:
x
�������
� �
point of
inflection
point of
inflection
244 STATISTICS (Chapter 7)
Determine the probability that the diameter of a randomly chosen piston liesbetween . mm and . mm.3 997 4 003
GRAPHING
PACKAGE
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\244SA12STU-2_07.CDR Thursday, 9 November 2006 3:04:23 PM DAVID3
INVESTIGATION 3 CALCULATING PROBABILITIES
FROM NORMAL DISTRIBUTIONS
Suppose a dietician wants to know the mean
weight of thirteen year old Australian boys.
It is impractical to weigh each thirteen year
old boy in Australia, but the dietician could
find the mean weight of a randomly selected
sample of, say, 10 boys.
The mean weight of the sample of 10 boys
is a statistic that is then used to estimate the
population parameter.
Clearly the mean weight depends on the sam-
ple. If another health worker had selected a
different sample of 10 boys, it would be un-
likely that the two sample means would be
the same.
The statistic the sample weight is a new variable. Repeated sampling can be used to discover
how the variable sample weight is distributed. In particular we want to know how the mean
of the sample means and the standard deviation of the sample means is related to the parent
population of 13 year old boys.
The following investigation explores the relation between the statistic “sample mean” and the
parameter “population mean”.
SPREADSHEET
DISTRIBUTION OF SAMPLE MEANSG
To find probabilities from a normal distribution you need to be able to find
areas between the graph of f(x) = 1¾p2¼
e¡12 (
x¡¹¾
)2 and the x-axis.
A simple way to estimate these probabil-
ities is to approximate them with areas
of rectangles that fit snugly around the
curve.
The area beneath the smooth curve is
approximately equal to the sum of the
areas of the rectangles.
Use a spreadsheet to:
² calculate the area of each rectangle using area = base £ height
² add the areas of rectangles to find an approximate area below the curve.
Details of how to set up a spreadsheet can be found by clicking on
the icon.
What to do:
STATISTICS (Chapter 7) 245
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\245SA12STU-2_07.CDR Thursday, 2 November 2006 3:14:23 PM PETERDELL
INVESTIGATION 4 A SIMPLE RANDOM SAMPLER
Suppose a school has 216 thirteen year old boys.
Let the variable X be the weight in kg of the boys.
The table shows all the possible values of X in random order.
31:2 35:7 36:4 33:2 37:3 35:0 34:0 33:6 34:4 32:0 32:7 36:730:8 33:8 32:9 35:4 31:9 36:7 32:0 29:2 33:6 31:0 32:5 36:433:3 36:7 27:9 32:0 36:4 34:5 35:3 31:6 32:5 35:3 34:6 31:134:9 30:9 33:2 33:8 33:6 30:5 37:7 30:9 35:0 33:2 36:2 35:231:8 35:9 32:8 30:8 29:0 32:1 34:6 32:7 35:4 30:4 33:3 30:233:3 35:5 32:0 34:8 30:2 36:3 35:7 38:9 32:0 28:0 32:7 33:6
35:4 31:2 32:5 29:6 35:1 32:9 37:3 33:6 36:7 30:7 32:8 32:529:4 33:5 32:5 30:1 34:9 32:3 34:9 31:4 33:0 32:4 29:7 33:630:6 30:5 30:5 36:3 34:3 32:1 36:6 31:3 30:8 29:8 30:8 29:233:1 35:0 32:5 34:1 33:2 32:9 30:2 33:4 33:2 31:1 32:3 30:632:0 31:4 32:4 37:1 32:5 35:9 29:4 30:3 34:9 32:1 34:6 35:731:4 27:5 31:7 37:1 29:9 31:6 35:4 32:5 33:4 35:2 34:2 29:5
34:3 31:9 33:2 34:5 32:4 30:8 32:4 32:0 27:1 36:4 34:0 32:431:9 32:6 29:4 32:6 35:5 33:0 35:5 31:4 40:6 37:1 31:4 30:031:5 31:6 34:2 29:1 35:4 29:9 32:0 33:7 29:0 32:0 29:9 34:635:0 27:0 31:8 36:1 32:7 31:0 30:4 35:9 38:4 31:6 34:4 31:632:3 33:4 35:3 38:7 37:5 32:1 29:7 33:9 34:0 34:2 29:2 37:629:3 34:0 30:6 37:1 30:4 33:2 33:7 28:5 36:2 35:7 36:4 33:2
1 Select a sample of 10 boys from this population by:
a rolling a die to select one of the 6 blocks
b rolling the die again to select a row in the block
c rolling the die again to select a boy in the row
d count off 10 boys from left to right from the boy you selected.
If the 3 rolls of the die produced f3, 2, 4g, the boy selected has weight 30:1 kg.
The sample selected is presented in the first column of the table.
2 Copy and enter your data in the following table.
Number Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
1 30:12 34:93 32:34 34:95 31:46 33:07 32:48 29:79 33:610 30:6
mean, x 32:3
1 2
3 4
5 6
What to do:
246 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\246SA12STU-2_07.CDR Thursday, 2 November 2006 3:14:31 PM PETERDELL
INVESTIGATION 5 A COMPUTER BASED RANDOM SAMPLER
3 The last row in this table consists of 5 sample means.
The variable of sample means can be denoted by X10. The bar on the top indicates
it is a variable of means; the subscript 10 indicates that the means are of samples of
size ten.
The last row of your table is a sample of size 5 from the distribution of X10.
4 Combine your results with those of the other students of your class.
Draw a histogram of the sample means.
5 Calculate the mean and the standard deviation of the sample means.
6 Compare the mean and the standard deviation you found in 5 with the mean weight
33:1 kg and standard deviation 2:54 kg of the 216 boys.
From Investigation 4 you should have discovered that the sample means are close to the
population mean. The mean of the sample means should be particularly close to the population
mean.
You should also have noted that the standard deviation of the sample means is smaller than
the standard deviation of the population.
The following important investigation uses a computer to speed up sampling and obtain a
more accurate picture of how the standard deviation of the sample means is related to the
standard deviation of the population.
In this investigation it is important to distinguish between:
² The original population, sometimes referred to as the “parent population ”, with a
random variable X which has mean ¹ and standard deviation ¾.
In Investigation 4 the parent population consists of 216 thirteen year old boys.
The mean ¹ = 33:1 kg and standard deviation ¾ = 2:54 kg.
and
² The new population with variable Xn, consisting of all statistics of sample means.
The subscript n indicating the sample size is sometimes omitted and the variable
just written X.
A typical outcome of X is a sample mean ¹x =x1 + x2 + :::::: + xn
n
In Investigation 4 a typical outcome is the mean weight of 10 boys.
The investigation explores the shape of the distribution of the random variable X, its
mean ¹X
or ¹(X), and its standard deviation ¾X
or ¾(X).
We start by sampling from a population which has a normal distribution. The heights of
18 year old Australian males may be approximately normal.
In this investigation we examine the variation in sample means.
We examine samples taken from symmetric distributions as well as one thatis skewed.
STATISTICS (Chapter 7) 247
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\247SA12STU-2_07.CDR Thursday, 2 November 2006 3:14:39 PM PETERDELL
1 Click on the icon given alongside. This opens a worksheet named
Samples with a number of buttons. Click on each of these buttons
in turn.
2 Sample size: from which you can select the numbers n = 10, 20, 40, 80, 160.
Start with n = 10.
3 Find sample means: finds the means of each of two hundred different samples.
4 Analyse: lists the two hundred sample means.
It finds the standard deviation sX
and
draws a histogram of these sample means.
It also superimposes a normal probability density function.
Trial 1 Trial 2 Trial 3 Trial 4
n (sX
)2 (sX
)2 (sX
)2 (sX
)2
10
20
40
80
160
5 Make a copy of the table alongside.
Enter the value of (sX
)2 in the first
column next to n = 10.
6 Go back to the worksheet named
Samples and change the sample size
to 20. Repeat steps 3, 4, and 5.
Enter the value of (sX
)2 next to
n = 20 in the table.
7 Repeat for samples of size 40, 80and 160.
8 We wish to see how (sX
)2 is related to the standard deviation of the population.
However, (sX
)2 can vary quite a lot, so to spot the pattern more clearly you should
repeat the experiment another 3 times.
9 From your experiment, determine a relationship between the square of the sample
standard deviation (sX
)2 and the square of the population standard deviation.
10
11
What to do: STATISTICS
PACKAGE
STATISTICS
PACKAGE
STATISTICS
PACKAGE
Now click on the icon to sample data from a population with auniform distribution. These distributions are very commonly usedin computer games where, for example, cards have to be selectedat random. Complete an analysis of this data by repeating theabove procedure and recording all results.
Now click on the icon to sample data from a population with anexponential distribution. These distributions are notoriously skew.They are commonly used in modelling lifetimes, such as thelifetime of light globes. Complete an analysis of this data byrepeating the above procedure and recording all results.
248 STATISTICS (Chapter 7)
This output is shown on the worksheet named Analysis.
Note that the first graph on this worksheet is the graph of the probability densityfunction of the population, and that the axes differ from that of the other graphs.
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\248SA12STU-2_07.CDR Thursday, 9 November 2006 3:45:45 PM DAVID3
APPENDIX
From the investigation you should have discovered the following:
If X is a random variable with mean ¹ and standard deviation ¾ then the random
variable Xn of sample means of size n has:
² mean ¹X
= ¹, the same as the mean of the random variable X
² standard deviation ¾X
=¾pn
.
Furthermore, for large values of n, Xn is approximately normal.
² The histogram of the sample means becomes symmetric and starts
to take on a bell-like shape. For large values of n it becomes
approximately normal.
² The mean of the sample
means approximates the
population mean.
Individual points selected
from any distribution are
likely to come from either
side of the mean, and dif-
ferences are likely to av-
erage out.
² As the sample size increases, there is less variability.
² This diagram shows what happens if the sample size n increases.
The spread decreases since =¾pn
and ¹X
= ¹:
You should notice:
¾X
¾X
¾X
�
x x x x1 2 3, , ,..., n x x x x1 2 3, , ,..., n x x x x1 2 3, , ,..., n
Sample 1 x1 Sample 2 x2 Sample 3 x3
x1 x2 x3
¹X
¾X
STATISTICS (Chapter 7) 249
In the the behaviour of the mean and the standard deviation areexplored algebraically. It is beyond the level of this course to show whythe distribution of the sample means is approximately normal.
Appendix
�
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\249SA12STU-2_07.CDR Wednesday, 8 November 2006 8:41:06 AM DAVID3
1 A machine produces sheets of cardboard with mean thickness 3 mm and standard devi-
ation 0:12 mm. A quality controller checks the thickness of each sheet in 10 different
places. Let the random variable X be the thickness of the cardboard at any point, and
let the random variable X10 be the mean thickness of the 10 points.
a The quality controller records the following thicknesses in mm from a sample of
10 points: 3:02, 2:77, 3:08, 2:89, 3:21, 2:79, 2:97, 3:07, 2:94, 3:01: What is the
corresponding outcome of the random variable X10?
b If the quality controller records 10 outcomes of X as:
x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, what is the corresponding statistic of X10?
c What is the mean and standard deviation of X10?
2 Records show that a machine has been producing screws with mean length 75 mm and
standard deviation 0:5 mm. Screws are packaged in lots of 50. Let the random variable
X50 be the mean length of a screw in a packet.
Find the mean and standard deviation of X50.
The life expectancy , of a certain brand of AAA battery is known to have a
mean hours and standard deviation hours. The batteries are sold in
packets of . Let the random variable be the mean life expectancy of batteries in
a packet.
X¹ ¾ :
X� � � � �= 27 = 3 25
6 6
a
What is the corresponding outcome of the random variable X6?
b If the numbers of hours lasted by batteries in a packet of six were
x1, x2, x3, x4, x5, x6 what is the corresponding outcome of X6 ?
c What is the mean and standard deviation of X6?
a The outcomes of X6 are the means of the life expectancies of 6 batteries in
a packet. In this case the outcome of X6 is the statistic
x =25:3 + 21:6 + 27:75 + 22:25 + 35:5 + 28:5
6+ 26:8
b If the batteries in the packet lasted for x1, x2, x3, x4, x5, x6 hours, the
corresponding outcome of X6 is the statistic x =x1 + x2 + x3 + x4 + x5 + x6
6.
c The mean of X6 is the same as the mean of X, so ¹X6
= 27 hours.
Since the standard deviation of X is 3:25, the standard deviation of X6 is
¾X6
=¾p6
=3:25p
6+ 1:327
The batteries in a packet were tested and the number of hours they lastedwere: , , , , ,
625 3 21 6 27 75 22 25 35 5 28 5: : : : : :
Example 16
EXERCISE 7G.1
250 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\250SA12STU-2_07.CDR Thursday, 2 November 2006 3:14:56 PM PETERDELL
3 The time it takes a train from Adelaide to Belair to complete its journey is known to
have a mean of 40 minutes and standard deviation of 3 minutes. An inspector times 8such trips. Let X8 be the mean travel time of a sample of 8 trips. Find the mean and
standard deviation of X8.
4 Suppose the probability a coin falls heads is p and the probability it falls tails is q = 1¡p.
Let the random variable X = 1 if it falls heads and X = 0 if it falls tails.
a Show that the mean of X is p.
b Show that the standard deviation of X isppq =
pp(1 ¡ p).
c Let Xn be the sample mean of n tosses of the coin.
i Find the mean and standard deviation of Xn.
ii Describe in words how Xn is related to the tosses of a coin.
In general, knowing the mean and standard deviation of a random variable X is insufficient
information to calculate probabilities. However, we are able to calculate probabilities in the
special case where X is normally distributed. Not only that, but if X is normally distributed,
the random variable Xn of sample means of size n is also normally distributed.
Example 17
Including yourself there are 12 persons in the line to be served.
To complete buying your ticket in less than 10 minutes the mean serving time per
person has to be less than10 £ 60
12= 50 seconds.
The time it takes to serve a customer at a railway station ticket booth is normallydistributed with mean seconds and standard deviation seconds. You only have
minutes to buy your ticket or you will miss your train. If there is a line ofpeople in front of you waiting to be served, what is the probability you will catch thetrain?
T45 20
10 11
Example 18
Suppose the random variable X is normally distributed with mean 40 and standard
deviation 10. Let X20 be the sample means of size 20. Find:
a Pr(35 < X < 45) b Pr(35 < X20 < 45).
a Pr(35 < X < 45)
= normalcdf(35, 45, 40, 10)
+ 0:383
b The mean of X20 = mean of X = 40:
The standard deviation of X20 = 10p40
Pr(35 < X20 < 45)
= normalcdf(35, 45, 40, 10p40
)
= 0:998
Notice that about 38% of the individual outcomes are in the interval 35 < X < 45,
but almost all of the sample means lie in this interval.
STATISTICS (Chapter 7) 251
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\251SA12STU-2_07.CDR Wednesday, 8 November 2006 8:41:33 AM DAVID3
Let the random variable T 12 be the mean time to serve 12 persons.
Since T is normally distributed with mean 45 and standard deviation 20, T 12 is
normally distributed with mean 45 and standard deviation 20p12
.
Pr(T 12< 50)
= normalcdf(¡E99, 50, 45, 20p12
)
+ 0:807
5 Suppose the random variable X is normally distributed with mean 80 and standard
deviation 20. Let X10 be the sample means of size 10: Find:
a Pr(75 < X < 85) b Pr(75 < X10 < 85)
6 Let the random variable X be the IQ of 17 year old girls. Suppose X is normally
distributed with mean 105 and standard deviation 15.
a Find the probability that an individual 17 year old girl has an IQ of more than 110.
b Find the probability that the mean IQ of a class of twenty 17 year old girls is greater
than 110.
7 A manufacturer of chocolates produces chocolates of mean weight 20 g and standard
deviation 5 g. A box of 13 such chocolates is sold with the claim that the nett weight in
the box is 250 g. Assuming the weights are normally distributed:
a For what proportion of boxes is this claim correct?
b If the manufacturer decides to increase the number of chocolates to 15 per box, for
what proportion of boxes is the claim now true?
In the previous investigation, we also observed that the distribution of the sample means Xis approximately normal.
Note:
²
² In the special case where the population is normally distributed, the distribution X of
the sample means is always normal.
THE CENTRAL LIMIT THEOREM
The Central Limit Theorem
There is no simple answer as to how large should be before the central limit theoremcan be applied. It depends on many factors including how much accuracy is required. Ifthe population is very skew it may require a large sample size , whereas if thepopulation is symmetric a small sample size may be sufficient. As a rule of thumb,
is often used, but each case must be considered on its merits.
n
nn
n� �>30
So, the probability of catching the train is 0 807:
¹ and standard deviation ¾: For sufficiently large n, the distribution Xn of the sample means
of size n, is approximately normal with mean ¹X
= ¹ and standard deviation ¾X
=¾pn:
Suppose is a random variable which is not necessarily normally distributed, but has meanX
252 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\252SA12STU-2_07.CDR Thursday, 2 November 2006 3:15:09 PM PETERDELL
The standard deviation ¾X
=¾pn
of the sample means X is a measure of the
variability of sample means, and is called the sampling error or the standard error.
Note:
² Unless the population is small, the population size is almost irrelevant.
²
For example, a sample size of 1000 gives a sampling error of ¾X
=¾p1000
+¾
32
whereas a sample of 4000, four times the size, only halves the sampling error.
Two histograms of samples, each of size , are shown below. One is from auniform distribution with mean and standard deviation . The other is fromthe distribution of the sample means of size selected from the distribution .Note that the scales are not the same in the two diagrams.
40010 5 77
36X :
X X36
a Which of the two histograms is from X36? Give reasons for your answer.
b From the diagram estimate Pr (X36 < 9).
c Find the approximate mean and standard deviation of X36.
d Use the histogram to estimate the probability X36 is one standard deviation
from the mean.
a The data in Histogram A is less spread out than that in Histogram B, and
appears clustered around 10. Histogram A is the histogram for the
distribution X36.
b To find Pr (X36 < 9) we count the numbers in all the bins before the
bin [9, 9:25), and use the fact that there are 400 in the sample. We get:
THE SAMPLING ERROR
The larger the value of , the smaller the sampling error. A sufficiently large sampleshould give an accurate estimate of the mean. However, making the sample size too bigmay be expensive and may not improve the reliability of the estimate by much.
n
Example 19
We are trying to estimate the using a . By only looking at asmall portion of the population, the sample mean is likely to be different from the populationmean.
population mean sample mean
Histogram A
�
�
����
�
��
���
���
���
��
���
���
��
����
���
���
��
��
���
���
��
��
��
�
�
���
��
��
���
��
��
interval
Histogram B
�
�
��
��
���
���
�
���
�
����
�
����
�
���
��
��
��
���
�
���
��
���
��
����
��
interval
freq
uen
cy
freq
uen
cy
STATISTICS (Chapter 7) 253
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\253SA12STU-2_07.CDR Thursday, 2 November 2006 3:15:14 PM PETERDELL
Pr (X36 < 9) =15 + 15 + 12 + 3 + 2 + 3 + 2 + 1
400=
53
400+ 0:13
Your answer may vary a little depending on how well you can read the numbers
on the graph.
c The mean of X36 = mean of X = 10.
The standard deviation ¾X
+¾p36
=5:77
6= 0:962
d Pr(10 ¡ 0:96 < X < 10 + 0:96) = Pr(9:04 < X36 < 10:96)
+ Pr(9 < X36 < 11)
=30 + 27 + 39 + 44 + 45 + 42 + 31 + 30
400= 0:72
This crude estimate compares with 0:68 when using the normal approximation.
1 The IQ measurements of a population have mean 100 and standard deviation 15. Many
hundreds of random samples of size 36 are taken from the population and a relative
frequency histogram of the sample means is formed.
a What would we expect the mean of the samples to be?
b What would we expect the standard deviation of the samples to be?
c What would we expect the shape of the histogram to look like?
2 Two histograms of sample size 300 each are shown below. One is from a life expectancy
distribution X with mean 10 and standard deviation 10. The other is from the distribu-
tion X64 of the sample means of size 64 selected from the distribution X. Note that
the scales are not the same in the two diagrams.
a Which of the two histograms is from X64? Give reasons for your answer.
b From the diagram estimate Pr(X64 < 9).
c Find the approximate mean and standard deviation of X64 .
d Use the histogram to estimate the probability that X64 is one standard deviation from
the mean. How does this answer compare with using the normal approximation?
EXERCISE 7G.2
Histogram A
0
20
40
60
���
� ��
�
���
��
���
��
�����
�
�����
��
�� ��
��
� ��
��
� ��
��
�����
�
freq
uen
cy
Histogram B
0
51015202530
���
���
���
��
���
���
��
����
���
���
��
��
���
���
��
��
��
�
�
���
��
��
���
��
��
���
���
��
� �
��
�
�
interval
freq
uen
cy
interval
36
254 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\254SA12STU-2_07.CDR Thursday, 2 November 2006 3:15:20 PM PETERDELL
3 During a one week period in Sydney the mean price of an orange was 42:8 cents with
standard deviation 8:7 cents. Find the probability that the mean price per orange from a
case of 60 oranges was less than 45 cents.
4 The mean energy content of a fruit bar is 1067 kJ with standard deviation 61:7 kJ. Find
the probability that the mean energy content of a sample of 30 fruit bars is more than
1050 kJ/bar.
5 The mean sodium content of a box of cheese rings is 1183 mg with standard deviation
88:6 mg. Find the probability that the mean sodium content per box for a sample of 50boxes lies between 1150 mg and 1200 mg.
6 Customers at a clothing store are in the shop for a mean time of 18 minutes with standard
deviation 5:3 minutes. What is the probability that in a sample of 37 customers the mean
stay in the shop is between 17 and 20 minutes?
7 The mean contents of a can of cola is 382 mL, even though it says 375 mL on a can.
The statistician at the factory says that the standard deviation is steady at 16:2 mL. Find
the probability that a slab of three dozen cans has mean contents less than 375 mL per
can.
The age of men in Australia is distributed with mean 43 and standard deviation 8.
If a sample of 67 men is selected from the population of Australian men, what is
the probability the sample mean is:
a less than 42 b greater than 45 c between 40 and 45?
Let the random variable X be the mean age of samples of 67 Australian males.
Assuming n = 67 is sufficiently large for the Central Limit Theorem to apply,
X is approximately normal with mean 43 and standard deviation ¾X
= 8p67
.
a Pr(X < 42)
= normalcdf(¡E99, 42, 43, 8p67
)
+ 0:153
b Pr(X > 45)
= normalcdf(45, E99, 43, 8p67
)
+ 0:0204
c Pr(40 < X < 45)
= normalcdf(40, 45, 43, 8p67
)
+ 0:979
Example 20
43 45
43 45 �
4342
STATISTICS (Chapter 7) 255
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\255SA12STU-2_07.CDR Thursday, 2 November 2006 3:15:26 PM PETERDELL
INVESTIGATION 6 CHOCKBLOCKS
Chockblock produce mini chocolatebars which vary a little in weight. Themachine used to make them producesbars whose weights are normally
distributed with mean grams and standarddeviation grams. bars are then placed in apacket for sale. Hundreds of thousands of packetsare produced each year with mean weight .
18 23 3 25
::
X
8 A sample of 375 people will be used to estimate
the mean number of hours that will be lost due
to sickness this year. Last year the standard de-
viation for the number of hours lost was 67 and
we will use this as the standard deviation this
year. What is the probability that the estimate is
9 A concerned union member wishes to estimate the hourly wage of shop assistants in
Adelaide. He decides to randomly survey 300 shop assistants to calculate the sample
mean. Assuming that the standard deviation is $1:27, find the probability that the estimate
of the population mean is in error by 10 cents or more.
1 What are the mean ¹X
and standard deviation ¾X
of X?
2 Printed on each packet is the nett weight of contents, 425 grams. What is the manu-
facturer claiming about the mean weight of each bar?
Let the random variable X be the mean of samples of 60. As the sample size is
larger than 30, we assume that X is normally distributed with mean ¹ and standard
deviation 8p60
.
We need to find Pr(¡2 < X ¡ ¹ < 2).
Now Pr(¡2 < X ¡ ¹ < 2) = Pr
µ ¡28p60
<X ¡ ¹
8p60
<28p60
¶
= Pr³
¡p60
4 < Z <p604
´= normalcdf(¡
p60
4 ,p604 , 0, 1)
+ 0:947
A population is known to have a standard deviation of but has an unknown mean. In order to estimate , the mean of a random sample of is found. Find the
probability that this estimate is out by less than .
860
2¹ ¹
Example 21
What to do:
in error by less than ten hours?
256 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\256SA12STU-2_07.CDR Thursday, 2 November 2006 3:15:32 PM PETERDELL
3 What percentage of their packets will be rejected because they fail to meet the 425gram claim?
4 An additional bar is added to each packet with the nett weight claim retained at 425grams.
a What is the minimum acceptable claim now?
b What are the mean ¹X
and standard deviation ¾X
now?
c What percentage of these packets would we expect to reject?
Claims are often made about the population mean of some
quantities.
For example, it is claimed that the mean protein content of a
1 litre carton of milk is 39 grams. The truth of this claim can
only be known by measuring the protein content of every 1litre carton of milk, clearly an impossible task. It is, however,
possible to draw reasonable conclusions from measuring the
protein content of a random selection of cartons.
A statistical hypothesis is a statement about a population parameter. The parameter
could be a population mean or a proportion.
In this section we will test hypotheses concerning the mean ¹.
When a statement is made about a product, it is usually tested statistically before changes to
the product are made.
The alternative hypothesis denoted Ha is that the statistical evidence is sufficient to accept
the consumer’s claim, i.e., that the milk company’s statement is false.
So, we consider two hypotheses:
HYPOTHESIS TESTING FOR A MEANH
HYPOTHESIS ABOUT MEANS
²
²
a which is a statement of or . It isassumed to be true until sufficient evidence is provided so that it is rejected.
an which is a statement that there orwhich has to be established. Supporting evidence is necessary if it is to
be accepted.
null hypothesis
alternative hypothesis
H
H
0 no difference no change
is a difference
changea
For example, suppose a consumer makes the statement that the mean protein content inlitre cartons of milk is not grams. The milk company does not want to go to the
expense of changing packaging until it is statistically shown that the mean protein content isindeed not grams. The company will start with the assumption that their claim is true,and whatever tests the consumer did were just random fluctuations. This assumption orstatement of no change is called the and is usually denoted .
1 39
39
�
null hypothesis H0
STATISTICS (Chapter 7) 257
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\257SA12STU-2_07.CDR Thursday, 2 November 2006 3:15:38 PM PETERDELL
We want to test the claim that the mean protein content of 1 litre cartons of milk is 39 grams.
The null hypothesis is H0: ¹ = 39
The alternative hypothesis is Ha: ¹ 6= 39
Suppose we select a sample of 10 cartons of milk and find that for this sample the mean
protein content is ¹x = 38:4 grams.
Suppose it is known that the standard deviation of protein in 1 litre containers of milk is
¾ = 0:8 grams.
Let X be the protein content of a 1 litre container of milk, so according to the null hypothesis,
X » N(39, 0:82).
Let the random variable X be the mean protein content of a sample of 10 one litre cartons.
Hence X » N ¹,µ ¾p
n
¶2i.e., X » N 39 ,
0:8p10
2
.
HYPOTHESIS TESTING WHEN THE POPULATION IS NORMALLY
DISTRIBUTED
We need to determine the likelihood that this difference isdue to random fluctuation or chance, or whether it issufficient evidence to say the milk company’s statement isincorrect.
Since the protein content of milk is a result of manydifferent factors, it is reasonable to assume that the proteincontent of litre cartons of milk is normally distributed.1
µ ¶ µ ¶ ¶µ
We use this to calculate the z-score of the observed value ¹x = 38:4 grams.
z =¹x¡ ¹¾pn
=38:4 ¡ 39
0:8p10
+ ¡2:37 So the number of standard deviations ¹x is
from the mean is ¡2:37 .
If the difference between the observed value of ¹x and the mean is due to chance alone, it
could just as likely have been 2:37 standard deviations to left or right of the mean. So, the
probability that X is 2:37 standard deviations or more either side of the mean is a measure
of how likely this is to occur.
Now Pr(Z 6 ¡2:37 or Z > 2:37) = 2 £ Pr(Z 6 ¡2:37) fsymmetryg= 2 £ normalcdf(¡E99, ¡2:37)
= 0:0178
so the probability of this event happening is small.
One of the problems with random processes is that differences can always be due to chance.
However, the practical solution is to reject the null hypothesis if the probability of the observed
or more extreme results occurring is small.
The probability ® at which we reject the null hypothesis is called the significance level of the
test. Common significance levels are ® = 0:05 or 5% and ® = 0:01 or 1%.
In the above example, Pr(Z 6 ¡2:37 or Z > 2:37) = 0:0178 . This is less than 0:05 so
we would reject the null hypothesis at the significance level of 0:05, but not at the significance
level of 0:01 .
258 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\258SA12STU-2_07.CDR Thursday, 2 November 2006 3:15:44 PM PETERDELL
The procedure for testing a hypothesis is:
Step 1: State the null hypothesis H0: ¹ = ¹0
and the alternate hypothesis Ha: ¹ 6= ¹0.
Step 2: Select a significance level, usually 0:05 .
Unless otherwise stated, the level of 0:05 is used in
this book.
Step 3: From a sample, calculate the sample mean ¹x.
If the parent population is normally distributed with
mean ¹ and standard deviation ¾, then the random
variable X of sample means has the normal
distribution
is called the null distribution:
The null distribution is critical. It allows us to calcu-
late the probability of the observed or more extreme
events happening if the null hypothesis is true.
Step 4: Use the sample mean ¹x to find the test statistic
z =¹x¡ ¹¾pn
:
Step 5: Calculate the probability of all observations having
z-values more extreme than the test statistic z found
in Step 3.
Step 6: ² Reject the null hypothesis if the P-value is less
than the significance level decided on in Step 2.
The smaller the P-value is, the stronger the
evidence against the null hypothesis.
² If the P-value is larger than the significance
level decided on in Step 2, do not reject the
null hypothesis.
H0: ¹ = 39
Ha: ¹ 6= 39
X » N(39, 0:2532)
z = ¡2:37
P= Pr(Z 6 ¡2:37or Z > 2:37)
= 0:0178
The is the probability of all observationshaving a -value more extreme than the test statistic.
P-value
z
N ¹,µ ¾p
n
¶2µ ¶.N ¹,
µ ¾pn
¶2µ ¶
Since we include the extreme outcomes either sideof the mean, we call this a . Onlytwo-sided tests are considered in this course.
two-sided -testZ
The name derives its name from this statistic.Z-test
Since P , wedo not reject the
null hypothesis atthe level.
� �> :
:
0 01
0 01
Since P wereject the null
hypothesis at thelevel.
� �< :
:
0 05
0 05
Milk cartons example
STATISTICS (Chapter 7) 259
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\259SA12STU-2_07.CDR Thursday, 2 November 2006 3:15:51 PM PETERDELL
When a null hypothesis is not rejected, the terms “retain” and “accept” are often used. This
does not mean that the null hypothesis is true, but rather that there is not enough evidence to
show it is not true.
Similarly, when rejecting the null hypothesis, it is often stated that the alternative hypothesis
is “accepted”. This does not mean that the alternative hypothesis is true. However, if the null
hypothesis is true, the outcome that led to rejecting it is a very unlikely one. The P-value
tells you just how unlikely.
Notice that weuse and notfor the test.
¾ sZ-
TI
C
260 STATISTICS (Chapter 7)
Note: If H0 is rejected,
² the direction of the difference is determined by the value of ¹x
² we still do not know how accurate the claim was.
1 A random variable X is normally distributed with a standard deviation ¾ = 4. It is
claimed that the mean of X is ¹ = 17.
a To test this claim a random sample of n = 50 was taken and the sample mean ¹xwas found to be 16.
i Write down the hypotheses H0 and Ha . ii Write down the null distribution.
Step 1: H0 : ¹ = 74 Ha : ¹ 6= 74
Step 2: Significance level is 0:05
Step 3: The sample mean, ¹x = 72
Let the random variable X be the sample means, so the null distribution
is X » N(¹,
µ¾pn
¶2
) i.e., X » N(74,
µ7p40
¶2
):
Step 4: The test statistic is z =¹x ¡ ¹¾pn
=72 ¡ 74
7p40
+ ¡1:81
Step 5: The P-value is P = Pr(Z 6 ¡1:81 or Z > 1:81)
= 2 £ Pr(Z 6 ¡1:81)
+ 0:0708
Step 6:
A Mathematics coaching school knows that the results for their final test arenormally distributed with population mean and standard deviation . A newcoaching technique which is cheaper to implement but reported to have the sameresults is trialled by the school. In a trial of students it is found that the meanscore for the final test is with standard deviation . Is there sufficientevidence at the level to conclude that the final test scores will be different?
74% 7%
4072% 6%
5%
Example 22
As P there is insufficient evidence to rejectthe null hypothesis that the new coaching produces thesame results as the old technique. We thus accept that thenew technique has the same result as the old technique.
� � � �=0 0708 0 05: > :
EXERCISE 7H.1
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\260SA12STU-2_07.CDR Thursday, 9 November 2006 10:12:45 AM DAVID3
iii Calculate the test statistic.
iv Calculate the P-value.
v What conclusion is there at the 0:05 level?
b Suppose that a random sample of n = 70 was taken and ¹x = 16. What can you
now conclude at the 0:05 level?
2 A random variable X is normally distributed with a standard deviation ¾ = 6. A random
sample of 40 was taken and the sample mean was found to be ¹x = 61:4 .
Use this information to test the claim that the population mean of X is ¹ = 60.
.N ¹,µ ¾p
n
¶2µ ¶
The bottlers of Groutt claim that the mean volume of bottles is 503 mL.
To test this claim 10 bottles were selected.
The measurements are listed below to the nearest 0:1 mL:
502:5, 501:0, 501:5, 503:9, 498:7, 505:7, 504:6, 499:4, 501:8, 501:1
Test the claim made by the bottlers of Groutt at the 5% level if it is known that
the population standard deviation ¾ is 1:8 mL.
We need to test: the null hypothesis H0: ¹ = 503
against the alternative hypothesis Ha: ¹ 6= 503
Let X be the volume of each bottle of Groutt. As the bottling of liquids is subject to
many random fluctuations, it is reasonable to assume that X is normally distributed
with mean ¹ and standard deviation ¾.
Let X be the distribution of the sample means, so the null distribution of X is
From the null hypothesis we assume that ¹ = 503.
From the sample we find that ¹x = 502:02, so the test statistic
z =¹x ¡ ¹¾pn
+502:02 ¡ 503
1:8p10
+ ¡1:722
The P-value is P = Pr(Z 6 ¡ 1:722 or Z > 1:722)
= 2 £ Pr(Z 6 ¡1:722)
+ 0:0851
As P > 0:05 there is insufficient evidence to reject the claim that the volume
of bottles of Groutt is 503 mL,
i.e., we accept that the mean volume could be 503 mL.
Example 23
STATISTICS (Chapter 7) 261
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\261SA12STU-2_07.CDR Wednesday, 8 November 2006 8:44:35 AM DAVID3
TI
C
GRAPHING
PACKAGE
3 A market gardener claims that the carrots in his field have a mean weight of 50 grams.
Before buying the crop a buyer pulls 20 carrots at random. She finds that their individual
weights in grams are:
57:6 34:7 53:9 52:5 61:8 51:5 61:3 49:2 56:8 55:9
57:9 58:8 44:3 58:3 49:3 56:0 59:5 47:0 58:0 47:2
a Explain why it is reasonable that the distribution of carrots’ weights is normally
distributed.
b Test the claim made by the market gardener if it is known that the standard deviation
for the whole crop is 7:1 grams.
4 The length of screws produced by a machine is known to be normally distributed with
standard deviation ¾ = 0:08 cm.
The machine is supposed to produce screws with a mean length of ¹ = 2:00 cm.
A quality controller selects a random sample of 15 screws and finds that the mean length
of the 15 screws is ¹x = 2:04 cm with sample standard deviation of s = 0:09 cm.
Does this justify the need to adjust the machine?
To see how to do hypothesis testing using a calculator,
click on the appropriate icon.
In the examples we have seen so far, the variable X was normally distributed and so the
distribution of sample means X was normally distributed also. This may not be true if Xis not normally distributed. However, if the sample size n is sufficiently large, the Central
Limit Theorem tells us that X is approximately normally distributed with mean ¹ and standard
deviation¾pn
.
We can use this fact to test claims about population means.
Susan’s resting pulse rate has been 55 beats per minute
for many years with standard deviation ¾ = 2:6 bpm.
During a 5 day period she checks her resting pulse rate
8 times a day at regular intervals and finds that it has
mean 56:2.
Is there sufficient evidence, at a 5% level, to conclude
that Susan’s pulse rate has changed?
The null hypothesis is H0: ¹ = 55. The alternative hypothesis is Ha: ¹ 6= 55
The significance level ® = 0:05 .
The number in the sample is n = 5 £ 8 = 40 and the sample mean is ¹x = 56:2.
The population standard deviation ¾ = 2:6 .
HYPOTHESIS TESTING WHEN THE POPULATION IS NOT NECESSARILY
NORMALLY DISTRIBUTED
Example 24
262 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\262SA12STU-2_07.CDR Thursday, 2 November 2006 3:16:08 PM PETERDELL
Let X be Susan’s resting pulse rate. We do not know how the random variable
X is distributed, but if we assume that n is large enough for the Central Limit
Theorem to apply then the null distribution for the sample means X is
approximately normally distributed with mean ¹ = 55 and standard
deviation¾pn
=2:6p40
= 0:411 .
Entering this information into the calculator gives a P-value of P = 0:003 51 . As
P = 0:003 51 < 0:05 there is evidence at the 0:05 level to reject the null hypothesis.
We accept the alternative hypothesis Ha that Susan’s pulse rate has changed.
1 Globe Industries make torch globes with standard deviation life time of ¾ = 9 hours. If
the globes last too long, people will have no need to buy new ones, but if they do not
last long enough, people will stop buying them. A quality controller is to ensure that
globes made by a machine have a mean life of 80 hours. The quality controller selects
a sample of 50 globes and finds that they have a mean life of 83 hours.
a What is the null hypothesis the quality controller is testing?
b Assuming that a sample of n = 50 is large enough for the Central Limit Theorem
to apply, what is the null distribution the quality controller will be using?
c Is there sufficient reason at the 5% level for the quality controller to adjust the
machine?
2 Let X be the outcome of the roll of a fair six-sided die. The mean outcome of such a
die is ¹ = 3:5 with standard deviation ¾ = 1:708. Jack thinks his die may not be
fair. To test this he rolls the die 100 times and finds that the mean of the 100 rolls is
3:2.
a What null hypothesis is Jack testing?
b Briefly explain why the outcomes of a roll of a fair die are not normally distributed.
c Assuming that a sample of size n = 100 is large enough for the Central Limit
Theorem to apply, what is the null distribution Jack should be using?
d Does Jack have enough evidence at the 5% level to claim the die is not fair?
e Jack’s sister Betty rolls the same die 200 times and finds that the mean of her sample
is also 3:2. Would Betty come to the same conclusion as Jack?
3 While peaches are being canned, 250 mg of preserva-
tive is supposed to be added by a dispensing device.
It is known that the standard deviation of preserva-
tive added is 7:3 mg.
To check the machine, the quality controller obtains
60 random samples of dispensed preservative and
finds that the mean preservative added was 242:6mg.
At a 5% level, is there sufficient evidence that the
machine is not dispensing a mean of 250 mg?
EXERCISE 7H.2
STATISTICS (Chapter 7) 263
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\263SA12STU-2_07.CDR Thursday, 2 November 2006 3:16:15 PM PETERDELL
0.025
1.960
RR of H0
0.025
�1.96
RR of H0
4 In recent times the mean age for New Zealand women on their first wedding day is 23:6years with a standard deviation of 2:9 years. To determine if this differs from Australian
women, a survey of 32 women was carried out. It was found that the mean age was
24:3 years. Test whether there is a significant difference at a 5% level.
H0
To test the null hypothesis H0 : ¹ = ¹0 against the alternative hypothesis Ha : ¹ 6= ¹0
we have used the test statistic z =¹x¡ ¹¾pn
.
P = Pr(Z 6 ¡z orZ > z) < 0:05
i.e., 2 £ Pr(Z 6 ¡z) < 0:05
i.e., Pr(Z 6 ¡z) < 0:025 :
But invNorm(0:025) + ¡1:96, and so
we reject the null hypothesis at the 5%level if the test statistic
z 6 ¡1:96 or z > 1:96 .
The rejection region for the null hypothesis H0 is the set of values of the test
statistic for which the null hypothesis is rejected.
The 5% rejection region for the null hypothesis H0 : ¹ = ¹0 is the set
fz : z 6 ¡1:96 or z > 1:96g
Note that the calculator also calculates the test statistic z when using the 2-sided Z-test.
REJECTION REGION FOR THE NULL HYPOTHESIS
Example 25
Assuming that , our test at the significance level has been to reject the nullhypothesis if the P-value
z� �> 0 5%
H0 : ¹ = 13:45, Ha : ¹ 6= 13:45 We use s = 0:25 to estimate ¾ as n is large.
Assuming that the sample of size n = 389 is large enough for the Central Limit
Theorem to apply, we find the test statistic z =¹x¡ ¹¾pn
=13:30 ¡ 13:45
0:25p389
+ ¡11:8
Since z < ¡1:96 we reject the null hypothesis that there is no difference in the
price and accept the alternative hypothesis that the price has changed.
A liquor chain claims that the mean price of wine has not changed from what it wasmonths ago. Records show that months ago the mean price was $ for a
mL bottle. A random sample of prices of different bottles of wine is takenfrom several stores and the mean price is $ and the standard deviation is $ .Is there sufficient evidence at the level to reject the claim?
12 12 13 45750 389
13 30 0 255%
:
: :
264 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\264SA12STU-2_07.CDR Thursday, 2 November 2006 3:16:21 PM PETERDELL
For questions 1 and 2, test the hypothesis using the rejection region for the null hypotheses.
In each case you may assume that the sample size n is large enough for the Central Limit
Theorem to apply.
1
2
To test the hypothesis H0 : ¹ = 40 against Ha : ¹ 6= 40, a random sample
of size 60 was taken and found to have mean ¹x and standard deviation s = 7.
For what values of ¹x will the null hypothesis be rejected at the 5% level? Assume
that the sample size is large enough for the Central Limit Theorem to apply.
The test statistic z =¹x¡ ¹¾pn
=¹x¡ 40
7p60
+¹x¡ 40
0:9037
The null hypothesis will be rejected if z 6 ¡1:96 or if z > 1:96
i.e., if¹x¡ 40
0:90376 ¡1:96 or if
¹x¡ 40
0:9037> 1:96
) ¹x 6 40 ¡ 1:96 £ 0:9037 or ¹x > 40 + 1:96 £ 0:9037
The null hypothesis will be rejected if ¹x 6 38:2 or ¹x > 41:8 .
EXERCISE 7H.3
Example 26
Quickshave produces disposable razorblades. They
claim that the mean number of shaves before a blade
has to be thrown away is 13. A researcher wishes to test
the claim and asks 30 men to supply data on how many
shaves they got from one of the Quickshave blades. The
researcher found that the mean of the sample was 12:8.
Use this information to test the manufacturer’s claim at
a 5% level if the population standard deviation ¾ is 1:6:
It is claimed that the mean disposable income of households in a country town is $50 per
week. To test this claim, 36 households were sampled and it was found that the mean
disposable income of the 36 families was $47. Use this to test the claim that the mean
disposable income is not $50 per week if the population standard deviation ¾ = $12.
3 To test the hypothesis H0 : ¹ = ¡23 against Ha : ¹ 6= ¡23, a random sample
of size 100 was taken and found to have mean ¹x.
For what values of ¹x will the null hypothesis be rejected at the 5% level? You may
assume that the sample size is large enough for the Central Limit Theorem to apply and
that the population standard deviation ¾ = 4.
4 The volume of soft drinks dispensed by a machine is normally distributed with standard
deviation 3 mL. A quality controller has to adjust the machine if the mean volume
dispensed is not 504 mL. To test the machine the quality controller finds the mean
volume ¹x of 20 randomly selected bottles every hour. For what values of ¹x should the
quality controller not adjust the machine?
STATISTICS (Chapter 7) 265
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\265SA12STU-2_07.CDR Thursday, 2 November 2006 3:16:27 PM PETERDELL
DISCUSSION
Does this mean that if you take a large enough sample, and have a measuring instrument that
can measure outcomes of X accurately enough, you can always reject the null hypothesis?
Compare the formal sentence, “There is a statistically significant difference between the
population mean ¹ and ¹0.” with what is commonly understood by, “There is a significant
difference between the population mean ¹ and ¹0.”
In this section we show how to use a sample mean x to calculate an interval in which we
expect the population mean ¹ to lie. As with all statistics, our estimate for x could by chance
be very far from ¹, and we can never be absolutely sure that ¹ lies within the interval. We
can, however, know how probable it is that ¹ lies in the interval.
A confidence interval estimate of a parameter (in this case the population mean ¹)
is an interval of values between two limits, together with a percentage indicating our
confidence that the parameter lies in that interval.
We now consider how a so-called 95% confidence interval is constructed.
We start by finding the number a for which the standard normal distribution Z has probability
Pr(¡a < Z < a) = 0:95 .
Pr(Z < ¡a) = 0:025
) ¡a = invNorm(0:025)
¡a = ¡1:95996
a + 1:96
So, Pr(¡1:96 < Z < 1:96) = 0:95
This means that:
In any normal distribution, 95% of the outcomes lie within 1:96 standard deviations
from the mean.
So, suppose the random variable X is normally distributed as N(¹, ¾2):
If X is the random variable of sample means of size n, then X » :
) 95% of all ¹x lie in the interval ¹¡ 1:96¾pn< ¹x < ¹ + 1:96
¾pn:
The null hypothesis assumes that the population mean is exactlyequal to . This is required to set up the null distribution needed tocalculate probabilities. However, if the variable that is being tested iscontinuous, the probability that is exactly equal to is zero!
H ¹¹
X¹ ¹
0
0
0
CONFIDENCE INTERVALS FOR MEANSI
�a a
0.95
0.0250.025
0
Because of the symmetry of the graph of thenormal distribution, the statement reduces to
N ¹,µ ¾p
n
¶2µ ¶
266 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\266SA12STU-2_07.CDR Thursday, 2 November 2006 3:16:33 PM PETERDELL
In the diagram we have shown a few ¹xvalues in this interval as well as one that
is not in this interval.
Note that each of the ¹x is in the middle of a line segment. All of these segments have the
same length as the line segment from ¹ ¡ 1:96¾pn
to ¹ + 1:96¾pn
.
Since Pr(¡1:96 < Z < 1:96) = 0:95 we know Pr(¡1:96 <X ¡ ¹
¾pn
< 1:96) = 0:95 .
So for the outcome x within the confidence interval,
x ¡ ¹¾pn
< 1:96 andx ¡ ¹¾pn
> ¡1:96
) x ¡ ¹ < 1:96¾pn
and x ¡ ¹ > ¡1:96¾pn
) ¹ > x ¡ 1:96¾pn
and ¹ < x + 1:96¾pn
This says that if we were to take many samples of size n and calculate the sample mean ¹xfor each of these samples, then for about 95% of these sample means, the population mean
¹ would lie in the interval
x ¡ 1:96¾pn< ¹ < x + 1:96
¾pn:
Confidence intervals for different confidence levels can be constructed for the population ¹in a similar way. Remember that we cannot be absolutely sure that ¹ will lie within the
confidence interval, but we can be confident that 95% of the time it will be.
�
x1
x1
x3
x3
x2
x2
95%
x4
x4
So, the 95% confidence interval for ¹is from
x¡ 1:96¾pn
to x + 1:96¾pn: x
–
n
�96.1
n
�96.1
x–
n
�96.1� x
–
n
�96.1�
lower limit upper limit
Notice that theinterval calculated
for does not`!vcontain .�
STATISTICS (Chapter 7) 267
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\267SA12STU-2_07.CDR Thursday, 9 November 2006 12:04:06 PM DAVID3
INVESTIGATION 7 CONFIDENCE LEVELS AND INTERVALS
Note: Consider samples of different size but all with mean 10 and standard deviation 2.
The 95% confidence interval is 10 ¡ 1:960 £ 2pn
< ¹ < 10 +1:960 £ 2p
n.
For various values of n we have: n Confidence interval
20 9:123 < ¹ < 10:877
50 9:446 < ¹ < 10:554
100 9:608 < ¹ < 10:392
200 9:723 < ¹ < 10:277
We see that increasing the sample size produces confidence intervals of shorter width.
DEMO
9 9.5 10 10.5 11
��10
n = 20n = 50n = 100n = 200
To obtain a greater understanding of confidence levels and intervals, clickon the icon to visit a random sampler demonstration. This willcalculate confidence intervals at various levels of yourchoice ( , , or ) and count the intervalswhich include the population mean.
90% 95% 98% 99%
We are given that x = 84:6 and ¾ = 16:8.
The 95% confidence interval is: x¡ 1:96¾pn
< ¹ < x + 1:96¾pn
i.e., 84:6 ¡ 1:96 £ 16:8p60
< ¹ < 84:6 +1:96 £ 16:8p
60
) 80:3 < ¹ < 88:9
So, we are 95% confident that the population mean weight of yabbies lies
between 80:3 grams and 88:9 grams.
A sample of yabbies was taken from a dam. The sample mean weight of theyabbies was grams. Find the confidence interval for the population mean ifthe population standard deviation is grams.
6084 6 95%
16 8:
:
Example 27
268 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\268SA12STU-2_07.CDR Thursday, 2 November 2006 3:16:45 PM PETERDELL
1 A random sample of n individuals is selected from a population with known standard
deviation 11. The sample mean is 81:6.
a Find a 95% confidence interval for ¹ if: i n = 36 ii n = 100.
b In changing n from 36 to 100, how does the width of the confidence interval change?
2 Neville works for a software company. He keeps records of the times customers have to
wait to receive telephone support for their software. During a six month period he logs
167 calls, and the mean waiting time is 8:7 minutes. Find a 95% confidence interval
for estimating the mean waiting time for all telephone customer calls for support if the
population standard deviation is 2:08 minutes.
3 A breakfast cereal manufacturer uses a machine to
deliver the cereal into plastic packets which then go
into cardboard boxes. The quality controller ran-
domly samples 75 packets and obtains a sample mean
of 513:8 grams. Construct a 95% confidence interval
in which the true population mean should lie if the
population standard deviation is 14:9 grams.
4 A sample of 42 patients from a drug rehabilitation program showed a mean length of
stay on the program of 38:2 days. Estimate with a 95% confidence interval the average
length of stay for all patients on the program if the population standard deviation is 4:7days.
The fat content (in grams) of 30 randomly selected pasties at the local bakery was
determined and recorded as:
15:1 14:8 13:7 15:6 15:1 16:1 16:6 17:4 16:1 13:917:5 15:7 16:2 16:6 15:1 12:9 17:4 16:5 13:2 14:017:2 17:3 16:1 16:5 16:7 16:8 17:2 17:6 17:3 14:7
From a calculator x = 15:90 and we are given ¾ = 1:35
The 95% confidence interval for ¹ is
x¡ 1:96¾pn< ¹ < x + 1:96
¾pn
) 15:90 ¡ 1:96 £ 1:35p30
< ¹ < 15:90 + 1:96 £ 1:35p30
) 15:4 < ¹ < 16:4
So, we are 95% confident that the mean fat content of all pasties produced lies
between 15:4 g and 16:4 g.
Determine a confidence interval for the mean fat content of all pasties made ifthe population standard deviation is grams.
95%1 35:
Example 28
EXERCISE 7I.1
STATISTICS (Chapter 7) 269
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\269SA12STU-2_07.CDR Thursday, 2 November 2006 3:16:52 PM PETERDELL
TI
C
5
84 53 66 61 80 75 67 74 59 56 81 68 74 6982 76 60 63 78 71 80 60 72 63 58 77 68 7263 71 67 76 54 72 64 70 70 61 82 68
A 95% confidence interval for a mean ¹ of a population was recorded as
8:5617 6 ¹ 6 9:4383. This estimate was based on a sample of size n = 60.
Use this information to calculate
a x, the sample mean
b ¾, the population standard deviation which was used to calculate the
confidence interval.
a The 95% confidence interval is x¡ 1:96¾pn< ¹ < x + 1:96
¾pn
So, x¡ 1:96¾pn
= 8:5617 and x + 1:96¾pn
= 9:4383
Adding these equations gives 2x = 8:5617 + 9:4383 = 18 and so x = 9.
b Substituting n = 60 and x = 9 into
x¡ 1:96¾pn
= 8:5617 gives 9 ¡ 1:96¾p60
+ 8:5617
) 1:96¾p60
+ 0:4383
) ¾ + 0:4383 £p
60
1:96+ 1:732
6 A 95% confidence interval for the mean ¹ of a population is based on a sample of
n = 50, and given by 3:5842 6 ¹ 6 4:4158. Find:
a x, the sample mean
b ¾, the population standard deviation which was used to calculate the confidence
interval.
7 A 95% confidence interval for the mean ¹ of a population is given by
19:685 6 ¹ 6 22:315. If the population standard deviation is ¾ = 6, what was the
sample size?
It is possible to obtain confidence intervals at any level of confidencefrom graphics calculators. Click on the icon to see how to do this onyour calculator.
Example 29
a Determine the sample mean x and standard deviation s.
b Using s to estimate ¾, determine a 95% confidence interval that the company would
use to estimate the mean point score for the population of applicants.
To work out the credit limit of a prospective credit card holder, a company gives pointsbased on factors such as employment, income, home and car ownership, and generalcredit history. A statistician working for the company randomly samples applicantsand determines the point total for each. These are:
40
270 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\270SA12STU-2_07.CDR Thursday, 2 November 2006 3:16:59 PM PETERDELL
When designing an experiment in which we wish to estimate the population mean, the size
of the sample is an important consideration. Finding the sample size is a problem that can be
solved using the confidence interval.
Let us revisit Example 28 on the fat content of pasties. The question arises:
‘How large should a sample be if we wish to be 95% confident that the sample mean will
differ from the population mean by less than 0:3 grams?’
i.e., ¡0:3 < ¹¡ x < 0:3
Now the 95% confidence interval for ¹ is: x¡ 1:96¾pn
< ¹ < x + 1:96¾pn
Hence ¡1:96¾pn
< ¹¡ x < 1:96¾pn
and we need to find n when 1:96¾pn
= 0:3 .
So,pn =
1:96¾
0:3=
1:96 £ 1:35
0:3+ 8:82 and so n + 78.
Thus, a sample of 78 pasties should be taken.
1 A researcher wishes to estimate the mean weight of adult crayfish in South Australian
waters. She knows that the population standard deviation ¾ is 250:5 grams. How large
must a sample be so that she is 95% confident that the sample mean differs from the
population mean by less than 70 grams?
2 A porridge manufacturer samples 80 packets of porridge and finds that the sample stan-
dard deviation s, of the contents’ weight is 17:8 grams. If s is used to estimate the
population standard deviation ¾, how many packets must be sampled to be 95% confi-
dent that the sample mean differs from the population mean by less than 3 grams?
DETERMINING HOW LARGE A SAMPLE SHOULD BE
Now ¡1:96¾pn
< ¹¡ x < 1:96¾pn
so we need to find n such that 1:96¾pn
= 5 i.e.,1:96 £ 16:8p
n= 5
) n =
µ1:96 £ 16:8
5
¶2
+ 43:37
A sample of 44 yabbies should be taken.
Revisit the yabbies from the dam problem of . Suppose we wish to findthe sample size needed to be confident that the sample mean differs from thepopulation mean by less than grams. What sample size should be taken?
Example 27
95%5
Example 30
EXERCISE 7I.2
STATISTICS (Chapter 7) 271
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\271SA12STU-2_07.CDR Thursday, 2 November 2006 3:17:04 PM PETERDELL
x
n
x�
� 96.1n
x�
� 96.1
w
3 Patients from an alcohol rehabilitation program participate for various lengths of time
with a standard deviation of 4:7 days. How many patients would have to be sampled to
be 95% confident that the sample mean number of days on the program differs from the
population mean by less than 1:8 days?
Consider the typical 95% confidence interval
shown in the diagram.
The width of this interval is w = 2 £ 1:96¾pn
.
In taking a sufficiently large sample size n we can make w as small as we like.
As w = 2 £ 1:96¾pn
,pn =
2 £ 1:96¾
wand so n =
µ2 £ 1:96¾
w
¶2
When we wish to estimate the population mean from a sample of size n at a 95%confidence level, the sample size is given by
n =
µ2£ 1:96¾
w
¶2
where ¾ is the population standard deviation
and w is the confidence interval width.
In Example 30, w = 2 £ 5 and ¾ + 16:8 : Thus, n =
µ2 £ 1:96 £ 16:8
10
¶2
+ 43:37, etc.
Since n is an integer, n = 44 would give a 95% confidence interval of width about 10 grams.
4 A population is known to have standard deviation ¾ = 34. Find the sample size n that
should be taken to find a 95% confidence interval for the population mean ¹ of width:
a w = 5 b w = 1 c w = 0:1
5 A manufacturer of bottled water knows that the machine dispenses water into 1 litre
bottles with a standard deviation of 2:3 mL. The machine needs to be checked regularly
to ensure it is still delivering the correct volume. How many bottles should a quality
controller be checking to find a 95% confidence interval of width:
a 2 mL b 1 mL c 0:5 mL?
6 a If the size n of a sample is doubled, by how much will the width of a 95% confidence
interval decrease?
b How much larger do you have to make a sample size to halve the width of a 95%confidence interval?
Confidence intervals provide an estimate for the size of the population mean ¹. They can
also be used to assess claims about population means. For example:
Suppose the volume V of fruit juice dispensed by a machine is normally distributed with
mean ¹ litres which can be adjusted, and standard deviation ¾ = 0:0015 litre (112 mL, about
14 of a teaspoon) which is fixed.
USING A CONFIDENCE INTERVAL FOR A CLAIM ABOUT ¹
272 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\272SA12STU-2_07.CDR Thursday, 2 November 2006 3:17:10 PM PETERDELL
Suppose a manufacturer needs to fill cartons with 1 litre of fruit juice. To ensure that almost
all cartons contain at least 1 litre, the value of the mean ¹ is set at 1:005 litre.
Note that for sufficiently large n the null hypothesis will not be accepted at the 5% level.
For such values of n the difference is statistically significant at the 5% level even though the
difference of 0:01 mL (hardly a drop) is not significant as the word is commonly understood.
1 Suppose the time it takes Joan to run 100 metres is normally distributed with mean
¹ = 12:46 seconds and standard deviation 1 second. To improve her time Joan goes on
a training program. After the training program, Joan finds that the mean time from 12trial runs is now 11:62 seconds.
a Construct a 95% confidence interval for Joan’s mean assuming the standard deviation
has not changed.
b Use the result of part a to assess the claims:
i Joan’s time to run 100 metres has improved.
ii Joan is now better than Betty whose time for the 100 metres is 11:97 seconds.
a Construct a 95% confidence interval for the volume ¹ dispensed by the machine.
b Use the 95% confidence interval to assess the claim that the volume dispensed by
the machine has increased.
c Can we conclude that the volume of ¹ is now larger than 1005 mL?
a The confidence interval is 1003 6 ¹ 6 1011:
b Since 995 is less than all the values in the 95% confidence interval we can be
confident that the population mean has increased.
c Althouth the sample statistic 1007 mL is larger than 1005, the smallest number
in the 95% confidence interval for ¹ is 1003 mL. This means that ¹ could be as
small as 1003 mL, and there is not enough evidence to support the claim that
¹ > 1005 mL.
Suppose the volume of cool drinks dispensed into cartons by a machine isnormally distributed with mean which can be adjusted, and standard deviation
mL which is fixed. The value of is supposed to be mL, but the machineoperator notices that actually mL. The operator therefore adjusts the volumedispensed by the machine. A quality controller tests cartons and finds that theirmean volume is mL.
V¹
¹¹
10 1005=995
251007
� �� � �
�
Example 31
Note: This question is closely related to testing the hypotheses ,.
H ¹H ¹
0�
�
: = 1005: =1005
� � �� � �a 6
EXERCISE 7I.3
A quality controller takes a sample of n cartons and, with very accurate measurements, finds
that the sample mean v = 1:004 99 litres. We want to test the hypotheses H0 = 1:005,
Ha 6= 1:005 for various large values of n:
STATISTICS (Chapter 7) 273
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\273SA12STU-2_07.CDR Thursday, 2 November 2006 3:17:16 PM PETERDELL
50.0 56.1 57.2 58.3
CI
A buyer for a restaurant chain goes to a seafood wholesaler to inspect a large catch
of 50 000 prawns. She has instructions to buy the catch only if the prawns are heavy
enough. The buyer selects a sample of 60 prawns and finds that their mean weight is
57:2 grams. It is known that the population standard deviation ¾ is 4:2 grams.
a Find the 95% confidence interval for the population mean.
b The buyer claims she is 95% confident that no more than 10% of the prawns
weigh less than 50 grams. Use the confidence interval found in part a to justify
this claim. You may assume that the weights of prawns are normally distributed.
a Using technology, the 95% confidence interval for the population mean ¹ is
56:1 6 ¹ 6 58:3 .
b The smallest value in the 95% confidence int-
erval is 56:1, and so the buyer can be 95%confident that the population mean ¹ > 56:1 .
If W is the weight of prawns, then W » N(¹, ¾2).
If we use ¹ = 56:1 and ¾ = 4:2, then using technology Pr(W < 50) = 0:0732.
Hence 7:32%, or less than 10% of the prawns weigh less than 50 grams.
OTHER APPLICATIONS OF CONFIDENCE INTERVALS
Example 32
2 A complaint was made to a call centre that it took a mean time of 12 minutes before a
caller was put through to an operator. After changes were made, the call centre claimed
that the service had improved. To check this claim, a consumer group made 40 calls to
the centre. They found the mean waiting time was 8 minutes with a standard deviation
of 3 minutes. Assuming that 40 is large enough for the Central Limit Theorem to apply,
construct a 95% confidence interval for the mean waiting time ¹. Does the confidence
interval support the call centre’s claim? (Use s to estimate ¾.)
3 The distance D a golfer can hit a ball is randomly distributed with a mean ¹ = 115metres and standard deviation ¾ = 32 metres.
a After spending time with a professional the golfer measured the distance of 30drives. The results of the drives in metres were as follows:
133 153 110 93 142 135 62 150 127 112119 171 143 92 162 128 149 73 39 84138 152 163 174 152 141 129 87 118 149
Assuming that the sample of 30 is large enough for the Central Limit Theorem to
apply, calculate a 95% confidence interval for the mean distance ¹ the golfer can
now hit the ball. Does the confidence interval provide enough evidence to support
the claim the golfer has improved?
b The golfer decided to have another trial of 50 drives. Suppose the mean of the 50trials is the same as in part a.
i
ii Does the new information provide evidence that the golfer has improved?
274 STATISTICS (Chapter 7)
Explain briefly why increasing the number of trials could make a differenceto a drive length.
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\274SA12STU-2_07.CDR Friday, 10 November 2006 12:22:49 PM PETERDELL
1 The manager of a golf club claimed that the income of most of its members was in excess
of $75 000 and thus its members could afford to pay increased annual subscriptions. To
justify this claim was not valid, the members sought the help of a statistician.
The statistician examined a random sample of 113 club members and found that the mean
income was $96 318. It is known that the standard deviation of the members’ incomes
is $14 268:
a Find the 95% confidence interval for the population mean income of all members.
b The statistician claimed that he was 95% certain that no more than 10% of the
members had a mean income of less than $75 000.
Assuming that the income of members is normally distributed, how could you justify
the statistician’s claim?
2 Fabtread manufacture motorcycle tyres. Under normal test conditions the stopping time
for motor cycles travelling at 60 km/h is 3:45 seconds with standard deviation 0:17seconds. Their production team has just designed and manufactured a new tyre tread.
They take 41 stopping time measurements with the new tyres and find the mean time is
3:03 seconds.
a Calculate a 95% confidence interval for the mean stopping time of the new tyres.
b The team claims that they are 95% certain that less than 15% of the stopping times
of their new tyres will exceed the 3:45 seconds of the old tyres.
Assuming that the stopping time is normally distributed, how could you justify the
team’s claim?
There are often good reasons to find confidence intervals other than those of 95%. In areas
like medicine, a researcher may want to have more certainty when making decisions and often
may prefer a confidence interval of 99%. In other areas where the outcomes of decisions are
not so important, people may be satisfied with 90% confidence intervals.
Your calculator can produce confidence intervals at any level.
1 The mean ¹ of a population is unknown, but its standard deviation is 10. In order to
estimate ¹ a random sample of size n = 35 was selected. The mean of the sample was
found to be 28:9.
a Find a 95% confidence interval for ¹. b Find a 99% confidence interval for ¹.
c In changing the confidence level from 95% to 99%, how does the width of the
confidence interval change?
2 If the P% confidence interval for ¹ is x¡ a
µ¾pn
¶< ¹ < x + a
µ¾pn
¶then
for P = 95, a = 1:960: Find a if P is: a 99 b 80 c 85 d 96.
3 The choice of the confidence level to be used is made by an experimenter. Why is it
that experimenters do not always choose confidence intervals of at least 99%?
EXERCISE 7I.4
EXTENSION TO CONFIDENCE INTERVALS OTHER THAN 95%
EXERCISE 7I.5
STATISTICS (Chapter 7) 275
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\275SA12STU-2_07.CDR Thursday, 2 November 2006 3:17:29 PM PETERDELL
REVIEW SET 7A
1 The arm lengths of 18 year old females are normally distributed with mean 64 cm
and standard deviation 4 cm.
a Find the percentage of 18 year old females whose arm lengths are:
i between 60 cm and 72 cm ii greater than 60 cm.
b Find the probability that if an 18 year old female is chosen at random, she will
have an arm length in the range 56 cm to 68 cm.
2 a If Z has a standard normal distribution, find k if Pr(Z 6 k) = 0:95 .
b If X » N(23, 2:62) find k if Pr(X < k) = 0:6 .
3 In a mathematics test out of 40 marks, the mean mark was 28:3 and the standard
deviation was 4:1. The marks were all integers and the minimum pass mark was set
at 24. Assuming marks were approximately normal, what proportion of the students:
a passed the test b scored more than 20 c scored between 25 and 35?
4 The weights of apples from an orchard are known to be normally distributed with
mean ¹ = 350 grams and standard deviation ¾ = 25 grams. The apples are packed
in boxes of 50 each.
a How many apples in a box would you expect to weigh more than 375 grams,
and how many less than 325 grams?
b In 500 boxes, how many apples would you expect to have a weight between 325and 375 grams?
5 To test the hypotheses H0: ¹ = 36 and Ha: ¹ 6= 36 a random sample of n = 20was selected. The outcomes are listed below:
38 22 43 21 36 44 20 49 36 3042 43 38 28 33 22 29 25 28 34
Use this information to test the null hypothesis at the 5% level if the population
standard deviation is 10 grams.
6 The standard deviation in the weight of cereal boxes is 23:6 grams. How many boxes
must be sampled from the population to be 95% confident that the sample mean differs
from the population mean by less than 4 grams?
7 A factory canning apricots uses a machine to deliver the fruit and syrup into cans. The
quality controller randomly samples 65 cans and finds that the mean mass of contents
is 828:2 grams.
a Construct a 95% confidence interval in which the true population mean should
lie if the population standard deviation is 16:3 grams.
b What should the sample size be to construct a confidence interval of half the
width of that in a?
8 a Kerry’s marks for an English essay and a Chemistry test were 26 out of 40 and
82% respectively.
i Explain briefly why the information given is not sufficient to determine
whether Kerry’s results are better in English than in Chemistry.
REVIEWJ
276 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\276SA12STU-2_07.CDR Thursday, 2 November 2006 3:17:34 PM PETERDELL
REVIEW SET 7B
ii Suppose that the marks of all students in both the English essay and the
Chemistry test were normally distributed as N(22, 42) and N(75, 72) re-
spectively. Use this information to determine which of Kerry’s two marks
is better.
iii If there were 50 students sitting for the English essay, how many would
have scored more than Kerry?
b Les is to sit for five subjects in the final examination. Because of many different
factors that determine examination marks, the marks Les can expect in each exam
are normally distributed. Suppose that the mean ¹ and standard deviation ¾ = 2are the same for each exam.
If ¹ = 12 calculate the probability that Les will gain a total mark for the five
subjects of between 60 and 70.
c The value of the mean ¹ depends on the time t hours that Les studies. It is given
by ¹ = 16 ¡ 8=(t + 2).
i For how long must Les study to achieve a value of ¹ = 15?
ii Les’s total score for the five examinations was 65. Use this information to
test the hypotheses H0 : ¹ = 15 and Ha : ¹ 6= 15.
iii Use the total score of 65 to construct a 95% confidence interval for the mean
¹. Use this interval to estimate a range of times Les might have studied for
the examination.
1 Find the mean and standard deviation of these two samples of lengths given in cm:
A 170:1 169:4 169:5 170:4 169:8 170:5 170:0 170:0 170:3 170:8170:0 169:9 170:2 170:0 169:9 169:9 170:5 170:1 169:7 170:0
B 177 166 153 167 176 173 169 161 172 174170 162 178 174 179 171 148 184 178 175
Which of the above is a sample of heights of 15 year old boys, and which is a sample
of length of planks cut by a machine?
2 The contents of a certain brand of soft drink can is normally distributed with mean
377 mL and standard deviation 4:2 mL.
a Find the percentage of cans with contents:
i less than 368:6 mL ii between 372:8 mL and 389:6 mL
b Find the probability of randomly selecting a can with contents between 364:4mL and 381:2 mL.
3 The life of a Xenon battery is known to be normally distributed with a mean of
33:2 weeks and a standard deviation of 2:8 weeks.
a Find the probability that a randomly selected battery will last at least 35 weeks.
b For how many weeks can the manufacturer expect the batteries to last before 8%of them fail?
4 The length of steel rods produced by a machine is normally distributed with a standard
deviation of 3 mm. It is found that 2% of all rods are less than 25 mm long. Find
the mean length of rods produced by the machine.
STATISTICS (Chapter 7) 277
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\277SA12STU-2_07.CDR Thursday, 2 November 2006 3:17:41 PM PETERDELL
5 a If Z has a standard normal distribution, find a if Pr(Z 6 a) = 0:9 .
b If X » N(15:6, 22) find a if Pr(X < a) = 0:9 .
6 A manufacturer claims that his canned soup contains 135 mg of salt. To check this
claim a consumer tested 87 cans for salt content and found that the mean was 139:6mg. It is known that the population standard deviation is 22:8 mg. At a 5% level is
there sufficient evidence to reject the manufacturer’s claim?
7 To test the null hypothesis H0: ¹ = 2000 and Ha: ¹ 6= 2000, a random sample
of n = 75 was selected and found to have mean x = 1840.
a If the population standard deviation ¾ = 690, is there sufficient evidence to reject
the null hypothesis at the 5% level?
b For what values of the sample mean ¹x would you not reject the null hypothesis at
the 5% level?
8 A telephone call centre handles many calls each day. Let T be the time in minutes
taken to answer a call.
In 2006 the mean answering time for a call was ¹ = 4:3 minutes with standard
deviation ¾ = 1:2 minutes.
Let T be the mean time taken to answer a random sample of 100 calls.
a The two histograms below show the distribution of a sample of size 50 taken
from T . Note that the horizontal scale and the bin width are the same in both
histograms, but the vertical scales are different.
Identify the histogram that represents a sample from T .
Explain your answer.
b i Assuming that n = 100 is sufficiently large, explain why the distribution
of T is approximately normal with mean 4:3 minutes and standard deviation
0:12 minutes.
ii Calculate the probability Pr(T 6 4:35).
iii Hence calculate the probability that an operator in the call centre can be
occupied in answering 100 calls for less than seven and a quarter hours.
c As well as answering routine calls, the supervisor of the call centre also han-
dles unusual cases that are too complicated for other staff to handle. When the
supervisor was timed her mean time to answer 100 calls was T = 4:6 minutes.
i Use the statistic T = 4:6 minutes to test the hypothesis H0 : ¹ = 4:3 and
Ha : ¹ 6= 4:3, at level.5%ii The supervisor is asked to explain why she is taking too long to answer
questions. What reasons can the supervisor provide to claim that the Central
Limit Theorem does not apply to her?
0
2
4
6
0 1 2 3 4 5 6 7 8
frequency
0
10
20
30
40
0 1 2 3 4 5 6 7
frequency
time (min)8
Histogram A Histogram B
278 STATISTICS (Chapter 7)
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\278SA12STU-2_07.CDR Wednesday, 8 November 2006 8:46:05 AM DAVID3
REVIEW SET 7C
1 Sketch the graph of X » N(3, 22).On the horizontal axis mark in the z-scores as well as their corresponding x values.
Calculate these probabilities: a Pr(¡1 6 X 6 1) b Pr(¡1 6 Z 6 1) .
2 Staplers are manufactured for $5:00 each and are sold for $20:00 each. The staplers
are guaranteed to last three years. The mean life is actually 3:42 years and the
standard deviation 0:4 years. If the life of these staplers is normally distributed, how
much profit would we expect from selling a batch of 2000 (with a maximum of one
replacement)?
3 The edible part of a batch of Coffin Bay oysters is normally distributed with mean
38:6 grams and standard deviation 6:3 grams. Given that the random variable X is
the mass of a Coffin Bay oyster, find:
a a if Pr(38:6 ¡ a 6 X 6 38:6 + a) = 0:6826 b b if Pr(X > b) = 0:8413.
4 King prawns are favourite items on the menu of Stirling Caterers. From past expe-
rience the manager knows that people on average eat 325 g of prawns with standard
deviation 86 g. The manager is to cater for a wedding of 80 guests and decides to
purchase 27:5 kg of prawns. What is the probability that the caterer will run out of
prawns?
5 For export purposes peaches must be neither too small nor too large. A grower claims
that the peaches in his orchard have a mean weight of 300 grams, just right for export.
A buyer knows that the population standard deviation is 30 grams, and he wants to
test the grower’s claim.
a What hypotheses should the buyer consider?
b Suppose the buyer selects a random sample of 100 peaches and finds that their
mean weight ¹x = 310 grams.
i What is the null distribution the buyer should use?
ii Calculate the test statistic z for this sample.
iii Does this sample support the grower’s claim at the 5% level?
6 Width (mm) Frequency
22 123 324 1725 4326 6827 4128 2429 3
a Find the sample mean.
b Determine a 95% confidence interval for the
population mean ¹.
7 Suppose the weight X of apricots is normally distributed with ¹ = 90 grams and
¾ = 10 grams.
a Calculate the proportion of apricots with weight less than 88 grams.
b In a box of 100 apricots, how many would you expect to weigh less than 88 g?
c The apricots are packaged into boxes of 100 each. What proportion of the boxes
will have apricots with a mean weight less than 88 g?
The average width of snail shells of a local speciesneeds to be estimated. It is known that the standarddeviation is . mm. Pauline takes a random sample of
snails and measures the width of each shell to thenearest mm. The results are shown in the tablealongside.
��
�
1 4200
STATISTICS (Chapter 7) 279
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\279SA12STU-2_07.cdr Wednesday, 8 November 2006 8:49:00 AM DAVID3
d On each of the boxes of 100 apricots is printed that the nett weight is 8:8 kilo-
grams. In a shipment of 500 boxes, for how many is the weight less than 8:8kilograms?
8 The time T it takes Laura to travel to work is normally distributed with mean ¹minutes and standard deviation 10 minutes. Laura’s work starts at 9 o’clock in the
morning.
a Suppose ¹ = 40 minutes and Laura leaves for work at a quarter past eight in
the morning.
i What is the probability she will be late?
ii If there are 250 working days in a year, how often would Laura be expected
to be late to work in a year?
b
ii Suppose Laura found that for her sample of 10 days the mean time to travel
to work was T 10 = 35 minutes. Use this information to test the hypotheses
H0 : ¹ = 40 and Ha : ¹ 6= 40, at level.5%
iii Calculate the 95% confidence interval for ¹.
iv How large a sample should Laura take to obtain a 95% confidence interval
of width 2:48 minutes?
c
Laura does not know the value of ¹ and decides to keep a 10 day record of the
time it takes her to go to work. Let T 10 be the distribution of the mean time
over 10 days it takes Laura to go to work.
i Briefly describe the distribution T 10 in terms of the distribution T it takes
Laura to go to work.
280 STATISTICS (Chapter 7)
After keeping records for a year consisting of working days, Laura foundthat the mean travelling time to work was minutes. She wants to becertain that she will be at work before o’clock at least of the time in thefollowing year. To the nearest minute, what is the latest time you would adviseLaura to leave home? Give reasons for your answer.
25031 52
9 90%: �
��
95%
SA_12STU-2magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\SA_12STU-2ed\SA12STU-2_07\280SA12STU-2_07.cdr Wednesday, 8 November 2006 8:50:22 AM DAVID3