Date post: | 02-Mar-2016 |
Category: |
Documents |
Upload: | alfonso-j-sintjago |
View: | 1,086 times |
Download: | 25 times |
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 1/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
1
"This document is attributed to Douglas S. Shafer, and Zhiyi Zhang”
About the Authors
Douglas S. Shafer
Douglas Shafer is Professor of Mathematics at the University of North Carolina at Charlotte. In addition to his position in Charlotte
he has held visiting positions at the University of Missouri at
Columbia and Montana State University and a Senior Fulbright
Fellowship in Belgium. He teaches a range of mathematics courses
as well as introductory statistics. In addition to journal articles and
this statistics textbook, he has co-authored with V. G. Romanovski (Maribor, Sloveia) a graduate
textbook in his research specialty. He earned a PhD in mathematics at the University of North
Carolina at Chapel Hill.
Zhiyi Zhang
Zhiyi Zhang is Professor of Mathematics at the University of North
Carolina at Charlotte. In addition to his teaching and research duties
at the university, he consults actively to industries and governments
on a wide range of statistical issues. His research activities in statistics
have been supported by National Science Foundation, U.S.
Environmental Protection Agency, Office of Naval Research, andNational Institute of Health. He earned a PhD in statistics at Rutgers University in New Jersey.
ReadLicenseInformation
FullLegalCode
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 2/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
2
Acknowledgements
We would like to thank the following colleagues whose comprehensive feedback and suggestions for
improving the material helped us make a better text:
Kathy Autrey, Northwestern State University
Kiran Bhutani, The Catholic University of America
Rhonda Buckley, Texas Woman’s University
Susan Cashin, University of Wisconsin-Milwaukee
Kathryn Cerrone, The University of Akron-Summit College
Zhao Chen, Florida Gulf Coast University
Ilhan Izmirli, George Mason University, Department of Statistics
Denise Johansen, University of Cincinnati
Eric Kean, Western Washington University Yolanda Kumar, Univeristy of Missouri-Columbia
Eileen Stock, Baylor University
Sean Thomas, Emory University
Sara Tomek, University of Alabama
Mildred Vernia, Indiana University Southeast
Gingia Wen, Texas Woman’s University
Jiang Yuan, Baylor University
We also acknowledge the valuable contribution of the publisher’s accuracy checker, Phyllis Barnidge.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 3/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
3
Dedication
To our families and teachers.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 4/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
4
Preface
This book is meant to be a textbook for a standard one-semester introductory statistics course for
general education students. Our motivation for writing it is twofold: 1.) to provide a low-costalternative to many existing popular textbooks on the market; and 2.) to provide a quality textbook
on the subject with a focus on the core material of the course in a balanced presentation.
The high cost of textbooks has spiraled out of control in recent years. The high frequency at which
new editions of popular texts appear puts a tremendous burden on students and faculty alike, as well
as the natural environment. Against this background we set out to write a quality textbook with
materials such as examples and exercises that age well with time and that would therefore not
require frequent new editions. Our vision resonates well with the publisher’s business model which
includes free digital access, reduced paper prints, and easy customization by instructors if additional
material is desired.
Over time the core content of this course has developed into a well-defined body of material that is
substantial for a one-semester course. The authors believe that the students in this course are best
served by a focus on the core material and not by an exposure to a plethora of peripheral topics.
Therefore in writing this book we have sought to present material that comprises fully a central body
of knowledge that is defined according to convention, realistic expectation with respect to course
duration and students’ maturity level, and our professional judgment and experience. We believethat certain topics, among them Poisson and geometric distributions and the normal approximation
to the binomial distribution (particularly with a continuity correction) are distracting in nature.
Other topics, such as nonparametric methods, while important, do not belong in a first course in
statistics. As a result we envision a smaller and less intimidating textbook that trades some extended
and unnecessary topics for a better focused presentation of the central material.
Textbooks for this course cover a wide range in terms of simplicity and complexity. Some popular
textbooks emphasize the simplicity of individual concepts to the point of lacking the coherence of an
overall network of concepts. Other textbooks include overly detailed conceptual and computational
discussions and as a result repel students from reading them. The authors believe that a successful
book must strike a balance between the two extremes, however difficult it may be. As a consequence
the overarching guiding principle of our writing is to seek simplicity but to preserve the coherence of
the whole body of information communicated, both conceptually and computationally. We seek to
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 5/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
5
remind ourselves (and others) that we teach ideas, not just step-by-step algorithms, but ideas that
can be implemented by straightforward algorithms.
In our experience most students come to an introductory course in statistics with a calculator that
they are familiar with and with which their proficiency is more than adequate for the course material.If the instructor chooses to use technological aids, either calculators or statistical software such as
Minitab or SPSS, for more than mere arithmetical computations but as a significant component of
the course then effective instruction for their use will require more extensive written instruction than
a mere paragraph or two in the text. Given the plethora of such aids available, to discuss a few of
them would not provide sufficiently wide or detailed coverage and to discuss many would digress
unnecessarily from the conceptual focus of the book. The overarching philosophy of this textbook is
to present the core material of an introductory course in statistics for non-majors in a complete yet
streamlined way. Much room has been intentionally left for instructors to apply their own
instructional styles as they deem appropriate for their classes and educational goals. We believe that
the whole matter of what technological aids to use, and to what extent, is precisely the type of
material best left to the instructor’s discretion.
All figures with the exception of Figure 1.1 "The Grand Picture of Statistics",Figure 2.1 "Stem and
Leaf Diagram", Figure 2.2 "Ordered Stem and Leaf Diagram",Figure 2.13 "The Box Plot", Figure 10.4
"Linear Correlation Coefficient ", Figure 10.5 "The Simple Linear Model Concept", and the
unnumbered figure in Note 2.50 "Example 16" of Chapter 2 "Descriptive Statistics" were generated
using MATLAB, copyright 2010.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 6/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
6
Chapter1
Introduction
In this chapter we will introduce some basic terminology and lay the groundwork for the course. We
will explain in general terms what statistics and probability are and the problems that these two
areas of study are designed to solve.
1.1 BasicDefinitionsandConcepts
L E A R N I N G O B J E C T I V E
1. Tolearnthebasicdefinitionsusedinstatisticsandsomeofitskeyconcepts.
We begin with a simple example. There are millions of passenger automobiles in the United States.
What is their average value? It is obviously impractical to attempt to solve this problem directly by
assessing the value of every single car in the country, adding up all those numbers, and then dividing
by however many numbers there are. Instead, the best we can do would be to estimate the average.
One natural way to do so would be to randomly select some of the cars, say 200 of them, ascertain
the value of each of those cars, and find the average of those 200 numbers. The set of all thosemillions of vehicles is called the population of interest, and the number attached to each one, its
value, is a measurement . The average value is a parameter: a number that describes a characteristic
of the population, in this case monetary worth. The set of 200 cars selected from the population is
called a sample, and the 200 numbers, the monetary values of the cars we selected, are the sample
data. The average of the data is called a statistic: a number calculated from the sample data. This
example illustrates the meaning of the following definitions.
Definition
A population is any specific collection of objects of interest. A sample is any subset or subcollection of
the population, including the case that the sample consists of the whole population, in which case it is
termed a census.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 7/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
7
Definition
A measurement is a number or attribute computed for each member of a population or of a sample.
The measurements of sample elements are collectively called the sample data.
Definition
A parameter is a number that summarizes some aspect of the population as a whole. A statistic is a
number computed from the sample data.
Continuing with our example, if the average value of the cars in our sample was $8,357, then it seems
reasonable to conclude that the average value of all cars is about $8,357. In reasoning this way we
have drawn an inference about the population based on information obtained from the sample. In
general, statistics is a study of data: describing properties of the data, which is called descriptivestatistics, and drawing conclusions about a population of interest from information extracted from a
sample, which is called inferential statistics. Computing the single number $8,357 to summarize the
data was an operation of descriptive statistics; using it to make a statement about the population was
an operation of inferential statistics.
Definition
Statistics is a collection of methods for collecting, displaying, analyzing, and drawing conclusions from
data.
Definition
Descriptive statistics is the branch of statistics that involves organizing, displaying, and describing
data.
Definition
Inferential statistics is the branch of statistics that involves drawing conclusions about a population
based on information contained in a sample taken from that population.
The measurement made on each element of a sample need not be numerical. Inthe case of
automobiles, what is noted about each car could be its color, its make, its body type, and so on. Such
data are categorical or qualitative, as opposed to numerical or quantitative data such as value or age.
This is a general distinction.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 8/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
8
Definition
Qualitative data are measurements for which there is no natural numerical scale, but which consist of
attributes, labels, or other nonnumerical characteristics.
Definition
Quantitative data are numerical measurements that arise from a natural numerical scale.
Qualitative data can generate numerical sample statistics. In the automobile example, for instance,
we might be interested in the proportion of all cars that are less than six years old. In our same
sample of 200 cars we could note for each car whether it is less than six years old or not, which is a
qualitative measurement. If 172 cars in the sample are less than six years old, which is 0.86 or 86%,
then we would estimate the parameter of interest, the population proportion, to be about the same as
the sample statistic, the sample proportion, that is, about 0.86.
The relationship between a population of interest and a sample drawn from that population is
perhaps the most important concept in statistics, since everything else rests on it. This relationship is
illustrated graphically in Figure 1.1 "The Grand Picture of Statistics". The circles in the large box
represent elements of the population. In the figure there was room for only a small number of them
but in actual situations, like our automobile example, they could very well number in the millions.
The solid black circles represent the elements of the population that are selected at random and that
together form the sample. For each element of the sample there is a measurement of interest,
denoted by a lower case x (which we have indexed as x1,…, xn to tell them apart); these measurements
collectively form the sample data set. From the data we may calculate various statistics. To anticipate
the notation that will be used later, we might compute the sample mean x− and the sample
proportion pˆ, and take them as approximations to the population mean (this is the lower case
Greek letter mu, the traditional symbol for this parameter) and the population proportion p,
respectively. The other symbols in the figure stand for other parameters and statistics that we will
encounter.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 9/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
9
Figure 1.1 The Grand Picture of Statistics
K E Y T A K E A W A Y S
• Statisticsisastudyofdata:describingpropertiesofdata(descriptivestatistics)anddrawingconclusions
aboutapopulationbasedoninformationinasample(inferentialstatistics).
• Thedistinctionbetweenapopulationtogetherwithitsparametersandasampletogetherwithits
statisticsisafundamentalconceptininferentialstatistics.
• Informationinasampleisusedtomakeinferencesaboutthepopulationfromwhichthesamplewas
drawn.
E X E R C I S E S
1. Explainwhatismeantbytheterm population.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 10/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
10
2. Explainwhatismeantbythetermsample.
3. Explainhowasamplediffersfromapopulation.
4. Explainwhatismeantbythetermsampledata.
5. Explainwhata parameter is.
6. Explainwhatastatisticis.
7. Giveanexampleofapopulationandtwodifferentcharacteristicsthatmaybeofinterest.
8. Describethedifferencebetweendescriptivestatistics andinferentialstatistics .Illustratewithanexample.
9. Identifyeachofthefollowingdatasetsaseitherapopulationorasample:
a. Thegradepointaverages(GPAs)ofallstudentsatacollege.
b. TheGPAsofarandomlyselectedgroupofstudentsonacollegecampus.
c. TheagesofthenineSupremeCourtJusticesoftheUnitedStatesonJanuary1,1842.
d. Thegenderofeverysecondcustomerwhoentersamovietheater.
e. ThelengthsofAtlanticcroakerscaughtonafishingtriptothebeach.
10. Identifythefollowingmeasuresaseitherquantitativeorqualitative:
a. The30high-temperaturereadingsofthelast30days.
b. Thescoresof40studentsonanEnglishtest.
c. Thebloodtypesof120teachersinamiddleschool.
d. Thelastfourdigitsofsocialsecuritynumbersofallstudentsinaclass.
e. Thenumbersonthejerseysof53footballplayersonateam.
11. Identifythefollowingmeasuresaseitherquantitativeorqualitative:
a. Thegendersofthefirst40newbornsinahospitaloneyear.
b. Thenaturalhaircolorof20randomlyselectedfashionmodels.
c. Theagesof20randomlyselectedfashionmodels.
d. Thefueleconomyinmilespergallonof20newcarspurchasedlastmonth.
e. Thepoliticalaffiliationof500randomlyselectedvoters.
12. Aresearcherwishestoestimatetheaverageamountspentperpersonbyvisitorstoathemepark.Hetakesa
randomsampleoffortyvisitorsandobtainsanaverageof$28perperson.
a. Whatisthepopulationofinterest?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 11/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
11
b. Whatistheparameterofinterest?
c. Basedonthissample,doweknowtheaverageamountspentperpersonbyvisitorstothepark?
Explainfully.
13. AresearcherwishestoestimatetheaverageweightofnewbornsinSouthAmericainthelastfiveyears.He
takesarandomsampleof235newbornsandobtainsanaverageof3.27kilograms.
a. Whatisthepopulationofinterest?
b. Whatistheparameterofinterest?
c. Basedonthissample,doweknowtheaverageweightofnewbornsinSouthAmerica?Explain
fully.
14. Aresearcherwishestoestimatetheproportionofalladultswhoownacellphone.Hetakesarandom
sampleof1,572adults;1,298ofthemownacellphone,hence1298∕1572≈.83orabout83%ownacell
phone.
a. Whatisthepopulationofinterest?
b. Whatistheparameterofinterest?
c. Whatisthestatisticinvolved?
d. Basedonthissample,doweknowtheproportionofalladultswhoownacellphone?Explain
fully.
15. Asociologistwishestoestimatetheproportionofalladultsinacertainregionwhohavenevermarried.Ina
randomsampleof1,320adults,145havenevermarried,hence145∕1320≈.11orabout11%havenever
married.
a. Whatisthepopulationofinterest?
b. Whatistheparameterofinterest?
c. Whatisthestatisticinvolved?
d. Basedonthissample,doweknowtheproportionofalladultswhohavenevermarried?Explain
fully.
16. a.Whatmustbetrueofasampleifitistogiveareliableestimateofthevalueofaparticular
populationparameter?
b.Whatmustbetrueofasampleifitistogivecertainknowledgeofthevalueofaparticular
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 12/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
12
populationparameter?
A N S W E R S
1. Apopulationisthetotalcollectionofobjectsthatareofinterestinastatisticalstudy.
3. Asample,beingasubset,istypicallysmallerthanthepopulation.Inastatisticalstudy,allelementsofa
sampleareavailableforobservation,whichisnottypicallythecaseforapopulation.
5. Aparameterisavaluedescribingacharacteristicofapopulation.Inastatisticalstudythevalueofa
parameteristypicallyunknown.
7. Allcurrentlyregisteredstudentsataparticularcollegeformapopulation.Twopopulationcharacteristicsof
interestcouldbetheaverageGPAandtheproportionofstudentsover23years.
9. a.Population.
b.Sample.
c. Population.
d. Sample.
e. Sample.
11. a.Qualitative.
b.Qualitative.
c. Quantitative.
d. Quantitative.
e. Qualitative.
13. a.AllnewbornbabiesinSouthAmericainthelastfiveyears.
b.TheaveragebirthweightofallnewbornbabiesinSouthAmericainthelastfiveyears.
c.No,notexactly,butweknowtheapproximatevalueoftheaverage.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 13/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
13
15. a.Alladultsintheregion.
b.Theproportionoftheadultsintheregionwhohavenevermarried.
c.Theproportioncomputedfromthesample,0.1.
d.No,notexactly,butweknowtheapproximatevalueoftheproportion.
1.2 Overview
L E A R N I N G O B J E C T I V E
1. Toobtainanoverviewofthematerialinthetext.
The example we have given in the first section seems fairly simple, but there are some significant
problems that it illustrates. We have supposed that the 200 cars of the sample had an average value
of $8,357 (a number that is precisely known), and concluded that the population has an average of
about the same amount, although its precise value is still unknown. What would happen if someone
were to take another sample of exactly the same size from exactly the same population? Would he get
the same sample average as we did, $8,357? Almost surely not. In fact, if the investigator who took
the second sample were to report precisely the same value, we would immediately become suspicious
of his result. The sample average is an example of what is called a random variable: a number that
varies from trial to trial of an experiment (in this case, from sample to sample), and does so in a way
that cannot be predicted precisely. Random variables will be a central object of study for us,
beginning in Chapter 4 "Discrete Random Variables".
Another issue that arises is that different samples have different levels of reliability. We have
supposed that our sample of size 200 had an average of $8,357. If a sample of size 1,000 yielded anaverage value of $7,832, then we would naturally regard this latter number as likely to be a better
estimate of the average value of all cars. How can this be expressed? An important idea that we will
develop in Chapter 7 "Estimation" is that of the confidence interval : from the data we will construct
an interval of values so that the process has a certain chance, say a 95% chance, of generating an
interval that contains the actual population average. Thus instead of reporting a single estimate,
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 14/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
14
$8,357, for the population mean, we would say that we are 95% certain that the true average is
within $100 of our sample mean, that is, between $8,257 and $8,457, the number $100 having been
computed from the sample data just like the sample mean $8,357 was. This will automatically
indicate the reliability of the sample, since to obtain the same chance of containing the unknown
parameter a large sample will typically produce a shorter interval than a small one will. But unless we perform a census, we can never be completely sure of the true average value of the population; the
best that we can do is to make statements of probability, an important concept that we will begin to
study formally in Chapter 3 "Basic Concepts of Probability".
Sampling may be done not only to estimate a population parameter, but to test a claim that is made
about that parameter. Suppose a food package asserts that the amount of sugar in one serving of the
product is 14 grams. A consumer group might suspect that it is more. How would they test the
competing claims about the amount of sugar, 14 grams versus more than 14 grams? They might take
a random sample of perhaps 20 food packages, measure the amount of sugar in one serving of each
one, and average those amounts. They are not interested in the true amount of sugar in one serving
in itself; their interest is simply whether the claim about the true amount is accurate. Stated another
way, they are sampling not in order to estimate the average amount of sugar in one serving, but to
see whether that amount, whatever it may be, is larger than 14 grams. Again because one can have
certain knowledge only by taking a census, ideas of probability enter into the analysis. We will
examine tests of hypotheses beginning in Chapter 8 "Testing Hypotheses".
Several times in this introduction we have used the term “random sample.” Generally the value of our data is only as good as the sample that produced it. For example, suppose we wish to estimate
the proportion of all students at a large university who are females, which we denote by p. If we
select 50 students at random and 27 of them are female, then a natural estimate is p≈ pˆ-27/50-0.54 or
54%. How much confidence we can place in this estimate depends not only on the size of the sample,
but on its quality, whether or not it is truly random, or at least truly representative of the whole
population. If all 50 students in our sample were drawn from a College of Nursing, then the
proportion of female students in the sample is likely higher than that of the entire campus. If all 50
students were selected from a College of Engineering Sciences, then the proportion of students in the
entire student body who are females could be underestimated. In either case, the estimate would be
distorted or biased. In statistical practice an unbiased sampling scheme is important but in most
cases not easy to produce. For this introductory course we will assume that all samples are either
random or at least representative.
K E Y T A K E A W A Y
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 15/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
15
• Statisticscomputedfromsamplesvaryrandomlyfromsampletosample.Conclusionsmadeabout
populationparametersarestatementsofprobability.
1.3 PresentationofData
L E A R N I N G O B J E C T I V E
1. Tolearntwowaysthatdatawillbepresentedinthetext.
In this book we will use two formats for presenting data sets. The first is a data list, which is an
explicit listing of all the individual measurements, either as a display with space between the
individual measurements, or in set notation with individual measurements separated by commas.
E X A M P L E 1
Thedataobtainedbymeasuringtheageof21randomlyselectedstudentsenrolledinfreshmancoursesat
auniversitycouldbepresentedasthedatalist
18 18 19 19 19 18 22 20 18 18 1719 18 24 18 20 18 21 20 17 19
orinsetnotationas
{18,18,19,19,19,18,22,20,18,18,17,19,18,24,18,20,18,21,20,17,19}
A data set can also be presented by means of a data frequency table, a table in which
each distinct value x is listed in the first row and its frequency f , which is the number of times the
value x appears in the data set, is listed below it in the second row.
E X A M P L E 2
Thedatasetofthepreviousexampleisrepresentedbythedatafrequencytable
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 16/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
16
x 17 18 19 20 21 22 24f 2 8 5 3 1 1 1
The data frequency table is especially convenient when data sets are large and the number of distinct values is not too large.
K E Y T A K E A W A Y
• Datasetscanbepresentedeitherbylistingalltheelementsorbygivingatableofvaluesandfrequencies.
E X E R C I S E S
1. Listallthemeasurementsforthedatasetrepresentedbythefollowingdatafrequencytable.
x21 22 22 24 25f 1 5 6 4 2
2. Listallthemeasurementsforthedatasetrepresentedbythefollowingdatafrequencytable.
x97 98 99 100 101 102 102 105f 7 5 2 4 2 2 1 1
3. Constructthedatafrequencytableforthefollowingdataset.
22 25 22 27 24 23
26 24 22 24 26
4. Constructthedatafrequencytableforthefollowingdataset.
{1,5,2,3,5,1,4,4,4,3,2,5,1,3,2,
1,1,1,2}
A N S W E R S
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 17/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
17
1. {31,32,32,32,32,32,33,33,33,33,33,33,34,34,34,34,35,35}.
3.
x 22 23 24 25 26 27f 3 1 3 1 2 1
Chapter2
DescriptiveStatistics
As described in Chapter 1 "Introduction", statistics naturally divides into two branches, descriptive
statistics and inferential statistics. Our main interest is in inferential statistics, as shown in Figure 1.1
"The Grand Picture of Statistics" in Chapter 1 "Introduction". Nevertheless, the starting point for
dealing with a collection of data is to organize, display, and summarize it effectively. These are the
objectives of descriptive statistics, the topic of this chapter.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 18/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
18
2.1ThreePopularDataDisplays
L E A R N I N G O B J E C T I V E
1. Tolearntointerpretthemeaningofthreegraphicalrepresentationsofsetsofdata:stemandleaf
diagrams,frequencyhistograms,andrelativefrequencyhistograms.
A well-known adage is that “a picture is worth a thousand words.” This saying proves true when it
comes to presenting statistical information in a data set. There are many effective ways to present
data graphically. The three graphical tools that are introduced in this section are among the most
commonly used and are relevant to the subsequent presentation of the material in this book.
StemandLeafDiagrams
Suppose 30 students in a statistics class took a test and made the following scores:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 19/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
19
86 80 25 77 73 76 100 90 69 93
90 83 70 73 73 70 90 83 71 95
40 58 68 69 100 78 87 97 92 74
How did the class do on the test? A quick glance at the set of 30 numbers does not immediately give a
clear answer. However the data set may be reorganized and rewritten to make relevant information more
visible. One way to do so is to construct a stem and leaf diagram as shown in . The numbers in the tens
place, from 2 through 9, and additionally the number 10, are the “stems,” and are arranged in numerical
order from top to bottom to the left of a vertical line. The number in the units place in each measurement
is a “leaf,” and is placed in a row to the right of the corresponding stem, the number in the tens place of
that measurement. Thus the three leaves 9, 8, and 9 in the row headed with the stem 6 correspond to the
three exam scores in the 60s, 69 (in the first row of data), 68 (in the third row), and 69 (also in the third
row). The display is made even more useful for some purposes by rearranging the leaves in numerical
order, as shown in . Either way, with the data reorganized certain information of interest becomes
apparent immediately. There are two perfect scores; three students made scores under 60; most students
scored in the 70s, 80s and 90s; and the overall average is probably in the high 70s or low 80s.
igure 2.1 Stem and Leaf Diagram
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 20/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
20
Figure 2.2 Ordered Stem and Leaf Diagram
In this example the scores have a natural stem (the tens place) and leaf (the ones place). One could spread
the diagram out by splitting each tens place number into lower and upper categories. For example, all the
scores in the 80s may be represented on two separate stems, lower 80s and upper 80s:
8 0 3 3
8 6 7
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 21/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
21
The definitions of stems and leaves are flexible in practice. The general purpose of a stem and leaf
diagram is to provide a quick display of how the data are distributed across the range of their values; some
improvisation could be necessary to obtain a diagram that best meets that goal.
Note that all of the original data can be recovered from the stem and leaf diagram. This will not be true in
the next two types of graphical displays.
FrequencyHistograms
The stem and leaf diagram is not practical for large data sets, so we need a different, purely graphical way
to represent data. A frequency histogram is such a device. We will illustrate it using the same data set
from the previous subsection. For the 30 scores on the exam, it is natural to group the scores on the
standard ten-point scale, and count the number of scores in each group. Thus there are two 100s, seven
scores in the 90s, six in the 80s, and so on. We then construct the diagram shown in by drawing for each
group, or class, a vertical bar whose length is the number of observations in that group. In our example,
the bar labeled 100 is 2 units long, the bar labeled 90 is 7 units long, and so on. While the individual data
values are lost, we know the number in each class. This number is called the frequency of the class,
hence the name frequency histogram.
Figure 2.3 Frequency Histogram
The same procedure can be applied to any collection of numerical data. Observations are grouped into
several classes and the frequency (the number of observations) of each class is noted. These classes are
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 22/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
22
arranged and indicated in order on the horizontal axis (called the x -axis), and for each group a vertical
bar, whose length is the number of observations in that group, is drawn. The resulting display is a
frequency histogram for the data. The similarity in and is apparent, particularly if you imagine turning the
stem and leaf diagram on its side by rotating it a quarter turn counterclockwise.
In general, the definition of the classes in the frequency histogram is flexible. The general purpose of a
frequency histogram is very much the same as that of a stem and leaf diagram, to provide a graphical
display that gives a sense of data distribution across the range of values that appear. We will not discuss
the process of constructing a histogram from data since in actual practice it is done automatically with
statistical software or even handheld calculators.
RelativeFrequencyHistograms
In our example of the exam scores in a statistics class, five students scored in the 80s. The number 5 is
the frequency of the group labeled “80s.” Since there are 30 students in the entire statistics class, the
proportion who scored in the 80s is 5/30. The number 5/30, which could also be expressed as 0.16≈.1667, or
as 16.67%, is the relative frequency of the group labeled “80s.” Every group (the 70s, the 80s, and so
on) has a relative frequency. We can thus construct a diagram by drawing for each group, or class, a
vertical bar whose length is the relative frequency of that group. For example, the bar for the 80s will have
length 5/30 unit, not 5 units. The diagram is a relative frequency histogram for the data, and is
shown in . It is exactly the same as the frequency histogram except that the vertical axis in the relative
frequency histogram is not frequency but relative frequency.
Figure 2.4 Relative Frequency Histogram
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 23/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
23
The same procedure can be applied to any collection of numerical data. Classes are selected, the relative
frequency of each class is noted, the classes are arranged and indicated in order on the horizontal axis,
and for each class a vertical bar, whose length is the relative frequency of the class, is drawn. The resulting
display is a relative frequency histogram for the data. A key point is that now if each vertical bar has width
1 unit, then the total area of all the bars is 1 or 100%.
Although the histograms in and have the same appearance, the relative frequency histogram is more
important for us, and it will be relative frequency histograms that will be used repeatedly to
represent data in this text. To see why this is so, reflect on what it is that you are actually seeing in
the diagrams that quickly and effectively communicates information to you about the data. It is
the relative sizes of the bars. The bar labeled “70s” in either figure takes up 1/3 of the total area of all
the bars, and although we may not think of this consciously, we perceive the proportion 1/3 in the
figures, indicating that a third of the grades were in the 70s. The relative frequency histogram is
important because the labeling on the vertical axis reflects what is important visually: the relative
sizes of the bars.
When the size n of a sample is small only a few classes can be used in constructing a relative
frequency histogram. Such a histogram might look something like the one in panel (a) of . If the
sample size n were increased, then more classes could be used in constructing a relative frequency
histogram and the vertical bars of the resulting histogram would be finer, as indicated in panel (b)
of . For a very large sample the relative frequency histogram would look very fine, like the one in (c)
of. If the sample size were to increase indefinitely then the corresponding relative frequency
histogram would be so fine that it would look like a smooth curve, such as the one in panel (d) of .
Figure 2.5 Sample Size and Relative Frequency Histograms
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 24/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
24
It is common in statistics to represent a population or a very large data set by a smooth curve. It is
good to keep in mind that such a curve is actually just a very fine relative frequency histogram in
which the exceedingly narrow vertical bars have disappeared. Because the area of each such vertical
bar is the proportion of the data that lies in the interval of numbers over which that bar stands, this
means that for any two numbers a and b, the proportion of the data that lies between the twonumbers a and b is the area under the curve that is above the interval (a,b) in the horizontal axis.
This is the area shown in . In particular the total area under the curve is 1, or 100%.
Figure 2.6 A Very Fine Relative Frequency Histogram
K E Y T A K E A W A Y S
• Graphicalrepresentationsoflargedatasetsprovideaquickoverviewofthenatureofthedata.
• Apopulationoraverylargedatasetmayberepresentedbyasmoothcurve.Thiscurveisaveryfine
relativefrequencyhistograminwhichtheexceedinglynarrowverticalbarshavebeenomitted.
• Whenacurvederivedfromarelativefrequencyhistogramisusedtodescribeadataset,theproportionof
datawithvaluesbetweentwonumbersaandbistheareaunderthecurvebetweenaandb,asillustrated
inFigure2.6"AVeryFineRelativeFrequencyHistogram".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 25/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
25
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 26/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
26
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 27/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
27
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 28/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
28
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 29/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
29
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 30/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
30
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 31/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
31
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 32/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
32
2.2MeasuresofCentralLocation
L E A R N I N G O B J E C T I V E S
1. Tolearntheconceptofthe“center”ofadataset.
2. Tolearnthemeaningofeachofthreemeasuresofthecenterofadataset—themean,themedian,and
themode—andhowtocomputeeachone.
This section could be titled “three kinds of averages of a data set.” Any kind of “average” is meant to
be an answer to the question “Where do the data center?” It is thus a measure of the central location
of the data set. We will see that the nature of the data set, as indicated by a relative frequency
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 33/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
33
histogram, will determine what constitutes a good answer. Different shapes of the histogram call for
different measures of central location.
TheMean
The first measure of central location is the usual “average” that is familiar to everyone. In the formula in
the following definition we introduce the standard summation notation , where is the capital Greek
letter sigma. In general, the notation followed by a second mathematical symbol means to add up all
the values that the second symbol can take in the context of the problem. Here is an example to illustrate
this.
In the definition we follow the convention of using lowercase n to denote the number of
measurements in a sample, which is called the sample size.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 34/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
34
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 35/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
35
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 36/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
36
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 37/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
37
In the examples above the data sets were described as samples. Therefore the means were sample means,
denoted by x ̅. If the data come from a census, so that there is a measurement for every element of the
population, then the mean is calculated by exactly the same process of summing all the measurements
and dividing by how many of them there are, but it is now the population mean and is denoted by , the
lower case Greek letter mu.
The mean of two numbers is the number that is halfway between them. For example, the average of the
numbers 5 and 17 is (5 + 17) 2 = 11, which is 6 units above 5 and 6 units below 17. In this sense the
average 11 is the “center” of the data set {5,17}. For larger data sets the mean can similarly be regarded as
the “center” of the data.
TheMedian
To see why another concept of average is needed, consider the following situation. Suppose we are
interested in the average yearly income of employees at a large corporation. We take a random sample of
seven employees, obtaining the sample data (rounded to the nearest hundred dollars, and expressed in
thousands of dollars).
24.8 22.8 24.6 192.5 25.2 18.5 23.7
The mean (rounded to one decimal place) is x ̅-47.4, but the statement “the average income of employees
at this corporation is $47,400” is surely misleading. It is approximately twice what six of the seven
employees in the sample make and is nowhere near what any of them makes. It is easy to see what went
wrong: the presence of the one executive in the sample, whose salary is so large compared to everyone
else’s, caused the numerator in the formula for the sample mean to be far too large, pulling the mean far
to the right of where we think that the average “ought” to be, namely around $24,000 or $25,000. The
number 192.5 in our data set is called an outlier, a number that is far removed from most or all of the
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 38/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
38
remaining measurements. Many times an outlier is the result of some sort of error, but not always, as is
the case here. We would get a better measure of the “center” of the data if we were to arrange the data in
numerical order,
18.5 22.8 23.7 24.6 24.8 25.2 192.5
then select the middle number in the list, in this case 24.6. The result is called the median of the data set,
and has the property that roughly half of the measurements are larger than it is, and roughly half are
smaller. In this sense it locates the center of the data. If there are an even number of measurements in the
data set, then there will be two middle elements when all are lined up in order, so we take the mean of the
middle two as the median. Thus we have the following definition.
Definition
The sample median x^~ of a set of sample data for which there are an odd number of measurements is
the middle measurement when the data are arranged in numerical order. The sample median x^~ of aset of sample data for which there are an even number of measurements is the mean of the two middle
measurements when the data are arranged in numerical order.
The population median is defined in a similar way, but we will not have occasion to refer to it again
in this text.
The median is a value that divides the observations in a data set so that 50% of the data are on its left
and the other 50% on its right. In accordance with , therefore, in the curve that represents the
distribution of the data, a vertical line drawn at the median divides the area in two, area 0.5 (50% of
the total area 1) to the left and area 0.5 (50% of the total area 1) to the right, as shown in . In our
income example the median, $24,600, clearly gave a much better measure of the middle of the data
set than did the mean $47,400. This is typical for situations in which the distribution is skewed.
(Skewness and symmetry of distributions are discussed at the end of this subsection.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 39/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
39
Figure 2.7 The Median
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 40/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
40
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 41/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
41
The relationship between the mean and the median for several common shapes of distributions is shown
in . The distributions in panels (a) and (b) are said to be symmetric because of the symmetry that they
exhibit. The distributions in the remaining two panels are said to be skewed . In each distribution we have
drawn a vertical line that divides the area under the curve in half, which in accordance with is located at
the median. The following facts are true in general:
a. When the distribution is symmetric, as in panels (a) and (b) of , the mean and the median are
equal.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 42/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
42
b. When the distribution is as shown in panel (c) of , it is said to be skewed right . The mean has
been pulled to the right of the median by the long “right tail” of the distribution, the few relatively large
data values.
c. When the distribution is as shown in panel (d) of , it is said to be skewed left . The mean has been
pulled to the left of the median by the long “left tail” of the distribution, the few relatively small data
values.
Figure 2.8 Skewness of Relative Frequency Histograms
TheMode
Perhaps you have heard a statement like “The average number of automobiles owned by households
in the United States is 1.37,” and have been amused at the thought of a fraction of an automobile
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 43/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
43
sitting in a driveway. In such a context the following measure for central location might make more
sense.
Definition
The sample mode of a set of sample data is the most frequently occurring value.
The population mode is defined in a similar way, but we will not have occasion to refer to it again in
this text.
On a relative frequency histogram, the highest point of the histogram corresponds to the mode of the
data set. illustrates the mode.
Figure 2.9 Mode
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 44/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
44
For any data set there is always exactly one mean and exactly one median. This need not be true of the
mode; several different values could occur with the highest frequency, as we will see. It could even happen
that every value occurs with the same frequency, in which case the concept of the mode does not make
much sense.
E X A M P L E 8
Findthemodeofthefollowingdataset.
−1 0 2 0
Solution:
Thevalue0ismostfrequentlyobservedandthereforethemodeis0.
E X A M P L E 9
Computethesamplemodeforthedataof.
Solution:
Thetwomostfrequentlyobservedvaluesinthedatasetare1and2.Thereforemodeisasetoftwo
values:{1,2}.
The mode is a measure of central location since most real-life data sets have moreobservations near the
center of the data range and fewer observations on the lower and upper ends. The value with the highest
frequency is often in the middle of the data range.
K E Y T A K E A W A Y
Themean,themedian,andthemodeeachanswerthequestion“Whereisthecenterofthedataset?”
Thenatureofthedataset,asindicatedbyarelativefrequencyhistogram,determineswhichonegivesthe
bestanswer.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 45/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
45
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 46/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
46
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 47/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
47
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 48/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
48
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 49/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
49
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 50/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
50
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 51/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
51
L A R G E D A T A S E T E X E R C I S E S
28. LargeDataSet1liststheSATscoresandGPAsof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Computethemeanandmedianofthe1,000SATscores.
b. Computethemeanandmedianofthe1,000GPAs.
29. LargeDataSet1liststheSATscoresof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Regardthedataasarisingfromacensusofallstudentsatahighschool,inwhichtheSATscoreofevery
studentwasmeasured.Computethepopulationmean μ.
b. Regardthefirst25observationsasarandomsampledrawnfromthispopulation.Computethesample
mean x^−andcompareitto μ.
c. Regardthenext25observationsasarandomsampledrawnfromthispopulation.Computethesample
mean x^−
andcompareitto μ.30. LargeDataSet1liststheGPAsof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Regardthedataasarisingfromacensusofallfreshmanatasmallcollegeattheendoftheirfirstacademic
yearofcollegestudy,inwhichtheGPAofeverysuchpersonwasmeasured.Computethepopulation
mean μ.
b. Regardthefirst25observationsasarandomsampledrawnfromthispopulation.Computethesample
mean x^−andcompareitto μ.
c. Regardthenext25observationsasarandomsampledrawnfromthispopulation.Computethesample
mean x^−andcompareitto μ.
31. LargeDataSets7,7A,and7Blistthesurvivaltimesindaysof140laboratorymicewiththymicleukemiafrom
onsettodeath.
http://www.flatworldknowledge.com/sites/all/files/data7.xls
http://www.flatworldknowledge.com/sites/all/files/data7A.xls
http://www.flatworldknowledge.com/sites/all/files/data7B.xls
a. Computethemeanandmediansurvivaltimeforallmice,withoutregardtogender.
b. Computethemeanandmediansurvivaltimeforthe65malemice(separatelyrecordedinLargeDataSet
7A).
c. Computethemeanandmediansurvivaltimeforthe75femalemice(separatelyrecordedinLargeDataSet
7B).
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 52/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
52
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 53/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
53
2.3MeasuresofVariability
L E A R N I N G O B J E C T I V E S
1. Tolearntheconceptofthevariabilityofadataset.
2. Tolearnhowtocomputethreemeasuresofthevariabilityofadataset:therange,thevariance,andthe
standarddeviation.
Look at the two data sets in Table 2.1 "Two Data Sets" and the graphical representation of each,
called a dot plot , in Figure 2.10 "Dot Plots of Data Sets".
Table 2.1 Two Data Sets
DataSetI: 40 38 42 40 39 39 43 40 39 40
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 54/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
54
DataSetII: 46 37 40 33 42 36 40 47 34 45
Figure 2.10 Dot Plots of Data Sets
The two sets of ten measurements each center at the same value: they both have mean, median, and
mode 40. Nevertheless a glance at the figure shows that they are markedly different. In Data Set I the
measurements vary only slightly from the center, while for Data Set II the measurements vary
greatly. Just as we have attached numbers to a data set to locate its center, we now wish to associate
to each data set numbers that measure quantitatively how the data either scatter away from the
center or cluster close to it. These new quantities are called measures of variability, and we will
discuss three of them.
TheRange
The first measure of variability that we discuss is the simplest.
Definition
The range of a data set is the number R defined by the formula
R= xmax− xmin
where xmax is the largest measurement in the data set and xmin is the smallest.
E X A M P L E 1 0
FindtherangeofeachdatasetinTable2.1"TwoDataSets".
Solution:
ForDataSetIthemaximumis43andtheminimumis38,sotherangeis R=43−38=5.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 55/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
55
ForDataSetIIthemaximumis47andtheminimumis33,sotherangeis R=47−33=14.
The range is a measure of variability because it indicates the size of the interval over which the data
points are distributed. A smaller range indicates less variability (less dispersion) among the data,
whereas a larger range indicates the opposite.
TheVarianceandtheStandardDeviation
The other two measures of variability that we will consider are more elaborate and also depend on
whether the data set is just a sample drawn from a much larger population or is the whole population
itself (that is, a census).
Although the first formula in each case looks less complicated than the second, the latter is easier to
use in hand computations, and is called a shortcut formula.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 56/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
56
The student is encouraged to compute the ten deviations for Data Set I and verify that their squares
add up to 20, so that the sample variance and standard deviation of Data Set I are the much smaller
numbers s2=20/9=2.2 ̂¯ and s=√20/9≈1.49.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 57/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
57
The sample variance has different units from the data. For example, if the units in the data set were
inches, the new units would be inches squared, or square inches. It is thus primarily of theoretical
importance and will not be considered further in this text, except in passing.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 58/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
58
If the data set comprises the whole population, then the population standard deviation,
denoted (the lower case Greek letter sigma), and its square, the population variance 2, are
defined as follows.
Note that the denominator in the fraction is the full number of observations, not that number
reduced by one, as is the case with the sample standard deviation. Since most data sets are samples,
we will always work with the sample standard deviation and variance.
Finally, in many real-life situations the most important statistical issues have to do with comparing
the means and standard deviations of two data sets. Figure 2.11 "Difference between Two Data
Sets" illustrates how a difference in one or both of the sample mean and the sample standard
deviation are reflected in the appearance of the data set as shown by the curves derived from the
relative frequency histograms built using the data.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 59/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
59
Figure 2.11 Difference between Two Data Sets
K E Y T A K E A W A Y
Therange,thestandarddeviation,andthevarianceeachgiveaquantitativeanswertothequestion“How
variablearethedata?”
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 60/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
60
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 61/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
61
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 62/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
62
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 63/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
63
L A R G E D A T A S E T E X E R C I S E S
19.
LargeDataSet1liststheSATscoresandGPAsof1,000students.http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Computetherangeandsamplestandarddeviationofthe1,000SATscores.
b. Computetherangeandsamplestandarddeviationofthe1,000GPAs.
20. LargeDataSet1liststheSATscoresof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Regardthedataasarisingfromacensusofallstudentsatahighschool,inwhichtheSATscoreofevery
studentwasmeasured.Computethepopulationrangeandpopulationstandarddeviationσ .
b. Regardthefirst25observationsasarandomsampledrawnfromthispopulation.Computethesamplerange
andsamplestandarddeviationsandcomparethemtothepopulationrangeandσ .
c. Regardthenext25observationsasarandomsampledrawnfromthispopulation.Computethesamplerange
andsamplestandarddeviationsandcomparethemtothepopulationrangeandσ .
21. LargeDataSet1liststheGPAsof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Regardthedataasarisingfromacensusofallfreshmanatasmallcollegeattheendoftheirfirstacademic
yearofcollegestudy,inwhichtheGPAofeverysuchpersonwasmeasured.Computethepopulationrange
andpopulationstandarddeviationσ .
b. Regardthefirst25observationsasarandomsampledrawnfromthispopulation.Computethesamplerange
andsamplestandarddeviationsandcomparethemtothepopulationrangeandσ .
c. Regardthenext25observationsasarandomsampledrawnfromthispopulation.Computethesamplerange
andsamplestandarddeviationsandcomparethemtothepopulationrangeandσ .
22. LargeDataSets7,7A,and7Blistthesurvivaltimesindaysof140laboratorymicewiththymicleukemiafrom
onsettodeath.
http://www.flatworldknowledge.com/sites/all/files/data7.xls
http://www.flatworldknowledge.com/sites/all/files/data7A.xls
http://www.flatworldknowledge.com/sites/all/files/data7B.xlsa. Computetherangeandsamplestandarddeviationofsurvivaltimeforallmice,withoutregardtogender.
b. Computetherangeandsamplestandarddeviationofsurvivaltimeforthe65malemice(separatelyrecorded
inLargeDataSet7A).
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 64/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
64
c. Computetherangeandsamplestandarddeviationofsurvivaltimeforthe75femalemice(separately
recordedinLargeDataSet7B).Doyouseeadifferenceintheresultsformaleandfemalemice?Doesit
appeartobesignificant?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 65/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
65
2.4RelativePositionofData
L E A R N I N G O B J E C T I V E S
1. Tolearntheconceptoftherelativepositionofanelementofadataset.
2. Tolearnthemeaningofeachoftwomeasures,thepercentilerankandthez-score,oftherelativeposition
ofameasurementandhowtocomputeeachone.
3. Tolearnthemeaningofthethreequartilesassociatedtoadatasetandhowtocomputethem.
4. Tolearnthemeaningofthefive-numbersummaryofadataset,howtoconstructtheboxplotassociated
toit,andhowtointerprettheboxplot.
When you take an exam, what is often as important as your actual score on the exam is the way your
score compares to other students’ performance. If you made a 70 but the average score (whether the
mean, median, or mode) was 85, you did relatively poorly. If you made a 70 but the average score was only 55 then you did relatively well. In general, the significance of one observed value in a data
set strongly depends on how that value compares to the other observed values in a data set.
Therefore we wish to attach to each observed value a number that measures its relative position.
PercentilesandQuartiles
Anyone who has taken a national standardized test is familiar with the idea of being given both a score on
the exam and a “percentile ranking” of that score. You may be told that your score was 625 and that it is
the 85th percentile. The first number tells how you actually did on the exam; the second says that 85% of
the scores on the exam were less than or equal to your score, 625.
Definition
Given an observed value x in a data set , x is the Pth percentile of the data if the percentage of the data
that are less than or equal to x is P. The number P is the percentile rank of x .
E X A M P L E 1 3
Whatpercentileisthevalue1.39inthedatasetoftenGPAsconsideredinNote2.12"Example
3"inSection2.2"MeasuresofCentralLocation"?Whatpercentileisthevalue3.33?
Solution:
Thedatawritteninincreasingorderare
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 66/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
66
1.39 1.76 1.90 2.12 2.53 2.71 3.00 3.33 3.71 4.00
Theonlydatavaluethatislessthanorequalto1.39is1.39itself.Since1is1∕10=.10or10%of10,the
value1.39isthe10thpercentile.Eightdatavaluesarelessthanorequalto3.33.Since8is8∕10=.80or
80%of10,thevalue3.33isthe80thpercentile.
The P th percentile cuts the data set in two so that approximately P % of the data lie below it
and (100− P )% of the data lie above it. In particular, the three percentiles that cut the data into fourths,
as shown in Figure 2.12 "Data Division by Quartiles", are called the quartiles. The following simple
computational definition of the three quartiles works well in practice.
Figure 2.12 Data Division by Quartiles
Definition
For any data set:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 67/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
67
1. The second quartile Q2 of the data set is its median.
2. Define two subsets:
1. the lower set: all observations that are strictly less than Q2;
2. the upper set: all observations that are strictly greater than Q2.
3. The first quartile Q1 of the data set is the median of the lower set.
4. The third quartile Q3 of the data set is the median of the upper set.
E X A M P L E 1 4
FindthequartilesofthedatasetofGPAsofNote2.12"Example3"inSection2.2"MeasuresofCentral
Location".
Solution:
Asinthepreviousexamplewefirstlistthedatainnumericalorder:
1.39 1.76 1.90 2.12 2.53 2.71 3.00 3.33 3.71 4.00
Thisdatasethasn=10observations.Since10isanevennumber,themedianisthemeanofthetwo
middleobservations: x˜=(2.53 + 2.71)/2=2.62.ThusthesecondquartileisQ2=2.62.Theloweranduppersubsets
are
Lower: L={1.39,1.76,1.90,2.12,2.53}
Upper: U ={2.71,3.00,3.33,3.71,4.00}
Eachhasanoddnumberofelements,sothemedianofeachisitsmiddleobservation.Thusthefirst
quartileisQ1=1.90,themedianofL,andthethirdquartileisQ3=3.33,themedianofU.
E X A M P L E 1 5
Adjointheobservation3.88tothedatasetofthepreviousexampleandfindthequartilesofthenewset
ofdata.
Solution:
Asinthepreviousexamplewefirstlistthedatainnumericalorder:
1.39 1.76 1.90 2.12 2.53 2.71 3.00 3.33 3.71 3.88 4.00
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 68/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
68
Thisdatasethas11observations.Thesecondquartileisitsmedian,themiddlevalue2.71.
ThusQ2=2.71.Theloweranduppersubsetsarenow
Lower: L={1.39,1.76,1.90,2.12,2.53}
Upper: U= {3.00,3.33,3.71,3.88,4.00}
ThelowersetLhasmedianthemiddlevalue1.90,soQ1=1.90.Theuppersethasmedianthemiddlevalue
3.71,soQ3=3.71.
In addition to the three quartiles, the two extreme values, the minimum x min and the maximum x max are
also useful in describing the entire data set. Together these five numbers are called the five-
number summary of the data set:
{ x min, Q1, Q2, Q3, x max}
The five-number summary is used to construct a box plot as in Figure 2.13 "The Box Plot". Each of the
five numbers is represented by a vertical line segment, a box is formed using the line segments
at Q1 and Q3 as its two vertical sides, and two horizontal line segments are extended from the vertical
segments marking Q1 and Q3 to the adjacent extreme values. (The two horizontal line segments are
referred to as “whiskers,” and the diagram is sometimes called a “box and whisker plot.”) We caution the
reader that there are other types of box plots that differ somewhat from the ones we are constructing,
although all are based on the three quartiles.
Figure 2.13 The Box Plot
Note that the distance from Q1 to Q3 is the length of the interval over which the middle half of the
data range. Thus it has the following special name.
Definition
The interquartile range (IQR) is the quantity
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 69/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
69
IQR=Q3−Q1
E X A M P L E 1 6
ConstructaboxplotandfindtheIQRforthedatainNote2.44"Example14".
Solution:
FromourworkinNote2.44"Example14"weknowthatthefive-numbersummaryis
x min=1.39 Q1=1.90 Q2=2.62 Q3=3.33 x max=4.00
Theboxplotis
TheinterquartilerangeisIQR=3.33−1.90=1.43.
z-scores
Another way to locate a particular observation x in a data set is to compute its distance from the mean in
units of standard deviation.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 70/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
70
The formulas in the definition allow us to compute the z -score when x is known. If the z -score is
known then x can be recovered using the corresponding inverse formulas
x=( x^ −)+ sz or x= µ+σ z
The z -score indicates how many standard deviations an individual observation x is from the center of
the data set, its mean. If z is negative then x is below average. If z is 0 then x is equal to the average.
If z is positive then x is above average. See Figure 2.14.
Figure 2.14 x -Scale versus z -Score
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 71/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
71
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 72/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
72
E X A M P L E 1 8
SupposethemeanandstandarddeviationoftheGPAsofallcurrentlyregisteredstudentsatacollege
are μ=2.70andσ =0.50.Thez-scoresoftheGPAsoftwostudents,AntonioandBeatrice,
are z =−0.62andz=1.28,respectively.WhataretheirGPAs?
Solution:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 73/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
73
Usingthesecondformularightafterthedefinitionofz-scoreswecomputetheGPAsasAntonio: x = µ+ z σ =2.70+(−0.62)(0.50)=2.39
Beatrice: x = µ+ z σ =2.70+(1.28)(0.50)=3.34
K E Y T A K E A W A Y S
• Thepercentilerankandz-scoreofameasurementindicateitsrelativepositionwithregardtotheother
measurementsinadataset.
• Thethreequartilesdivideadatasetintofourths.
• Thefive-numbersummaryanditsassociatedboxplotsummarizethelocationanddistributionofthedata.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 74/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
74
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 75/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
75
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 76/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
76
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 77/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
77
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 78/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
78
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 79/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
79
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 80/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
80
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 81/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
81
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 82/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
82
35.
EmiliaandFerdinandtookthesamefreshmanchemistrycourse,Emiliainthefall,Ferdinandinthespring.
Emiliamadean83onthecommonfinalexamthatshetook,onwhichthemeanwas76andthestandard
deviation8.Ferdinandmadea79onthecommonfinalexamthathetook,whichwasmoredifficult,since
themeanwas65andthestandarddeviation12.Theonewhohasahigherz-scoredidrelativelybetter.
WasitEmiliaorFerdinand?
36. Refertothepreviousexercise.Onthefinalexaminthesamecoursethefollowingsemester,themeanis68
andthestandarddeviationis9.WhatgradeontheexammatchesEmilia’sperformance?Ferdinand’s?
37. RosencrantzandGuildensternareonaweight-reducingdiet.Rosencrantz,whoweighs178lb,belongstoan
ageandbody-typegroupforwhichthemeanweightis145lbandthestandarddeviationis15lb.
Guildenstern,whoweighs204lb,belongstoanageandbody-typegroupforwhichthemeanweightis165lb
andthestandarddeviationis20lb.Assumingz-scoresaregoodmeasuresforcomparisoninthiscontext,
whoismoreoverweightforhisageandbodytype?
L A R G E D A T A S E T E X E R C I S E S
38. LargeDataSet1liststheSATscoresandGPAsof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Computethethreequartilesandtheinterquartilerangeofthe1,000SATscores.
b. Computethethreequartilesandtheinterquartilerangeofthe1,000GPAs.
39. LargeDataSet10recordsthescoresof72studentsonastatisticsexam.
http://www.flatworldknowledge.com/sites/all/files/data10.xls
a. Computethefive-numbersummaryofthedata.
b. Describeinwordstheperformanceoftheclassontheexaminthelightoftheresultinpart(a).
40. LargeDataSets3and3Alisttheheightsof174customersenteringashoestore.
http://www.flatworldknowledge.com/sites/all/files/data3.xls
http://www.flatworldknowledge.com/sites/all/files/data3A.xls
a. Computethefive-numbersummaryoftheheights,withoutregardtogender.
b. Computethefive-numbersummaryoftheheightsofthemeninthesample.
c. Computethefive-numbersummaryoftheheightsofthewomeninthesample.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 83/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
83
41. LargeDataSets7,7A,and7Blistthesurvivaltimesindaysof140laboratorymicewiththymicleukemiafrom
onsettodeath.
http://www.flatworldknowledge.com/sites/all/files/data7.xls
http://www.flatworldknowledge.com/sites/all/files/data7A.xls
http://www.flatworldknowledge.com/sites/all/files/data7B.xls
a. Computethethreequartilesandtheinterquartilerangeofthesurvivaltimesforallmice,withoutregardto
gender.
b. Computethethreequartilesandtheinterquartilerangeofthesurvivaltimesforthe65malemice
(separatelyrecordedinLargeDataSet7A).
c. Computethethreequartilesandtheinterquartilerangeofthesurvivaltimesforthe75femalemice
(separatelyrecordedinLargeDataSet7B).
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 84/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
84
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 85/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
85
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 86/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
86
2.5TheEmpiricalRuleandChebyshev’sTheoremL E A R N I N G O B J E C T I V E S
1. Tolearnwhatthevalueofthestandarddeviationofadatasetimpliesabouthowthedatascatteraway
fromthemeanasdescribedbytheEmpiricalRuleandChebyshev’sTheorem.
2. TousetheEmpiricalRuleandChebyshev’sTheoremtodrawconclusionsaboutadataset.
You probably have a good intuitive grasp of what the average of a data set says about that data set. In
this section we begin to learn what the standard deviation has to tell us about the nature of the data
set.
TheEmpiricalRule
We start by examining a specific set of data. Table 2.2 "Heights of Men" shows the heights in inches of 100
randomly selected adult men. A relative frequency histogram for the data is shown in Figure 2.15 "Heights
of Adult Men". The mean and standard deviation of the data are, rounded to two decimal places, x^−=69.92
and s = 1.70. If we go through the data and count the number of observations that are within one standard
deviation of the mean, that is, that are between 69.92−1.70=68.22 and 69.92+1.70=71.62 inches, there are 69 of
them. If we count the number of observations that are within two standard deviations of the mean, that is,
that are between 69.92−2(1.70)=66.52 and 69.92+2(1.70)=73.32 inches, there are 95 of them. All of the
measurements are within three standard deviations of the mean, that is,
between 69.92−3(1.70)=64.822 and 69.92+3(1.70)=75.02 inches. These tallies are not coincidences, but are inagreement with the following result that has been found to be widely applicable.
Table 2.2 Heights of Men
68.7 72.3 71.3 72.5 70.6 68.2 70.1 68.4 68.6 70.6
73.7 70.5 71.0 70.9 69.3 69.4 69.7 69.1 71.5 68.6
70.9 70.0 70.4 68.9 69.4 69.4 69.2 70.7 70.5 69.9
69.8 69.8 68.6 69.5 71.6 66.2 72.4 70.7 67.7 69.168.8 69.3 68.9 74.8 68.0 71.2 68.3 70.2 71.9 70.4
71.9 72.2 70.0 68.7 67.9 71.1 69.0 70.8 67.3 71.8
70.3 68.8 67.2 73.0 70.4 67.8 70.0 69.5 70.1 72.0
72.2 67.6 67.0 70.3 71.2 65.6 68.1 70.8 71.4 70.2
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 87/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
87
70.1 67.5 71.3 71.5 71.0 69.1 69.5 71.1 66.8 71.8
69.6 72.7 72.8 69.6 65.9 68.0 69.7 68.7 69.8 69.7
Figure 2.15 Heights of Adult Men
TheEmpiricalRule
If a data set has an approximately bell-shaped relative frequency histogram, then (see Figure 2.16 "The
Empirical Rule")
1. approximately 68% of the data lie within one standard deviation of the mean, that is, in the interval with
endpoints x^ −± s for samples and with endpoints µ±σ for populations;
2. approximately 95% of the data lie within two standard deviations of the mean, that is, in the interval with
endpoints x^ −±2 s for samples and with endpoints µ±2σ for populations; and
3. approximately 99.7% of the data lies within three standard deviations of the mean, that is, in the interval
with endpoints x^ −±3 s for samples and with endpoints µ±3σ for populations.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 88/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
88
Figure 2.16 The Empirical Rule
Two key points in regard to the Empirical Rule are that the data distribution must be approximately bell-
shaped and that the percentages are only approximately true. The Empirical Rule does not apply to data
sets with severely asymmetric distributions, and the actual percentage of observations in any of the
intervals specified by the rule could be either greater or less than those given in the rule. We see this with
the example of the heights of the men: the Empirical Rule suggested 68 observations between 68.22 and
71.62 inches but we counted 69.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 89/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
89
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 90/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
90
Figure2.17 DistributionofHeights
E X A M P L E 2 0
ScoresonIQtestshaveabell-shapeddistributionwithmean μ=100andstandarddeviationσ =10.
DiscusswhattheEmpiricalRuleimpliesconcerningindividualswithIQscoresof110,120,and130.
Solution:
AsketchoftheIQdistributionisgiveninFigure2.18"DistributionofIQScores".TheEmpiricalRulestates
that
1. approximately68%oftheIQscoresinthepopulationliebetween90and110,
2. approximately95%oftheIQscoresinthepopulationliebetween80and120,and
3. approximately99.7%oftheIQscoresinthepopulationliebetween70and130.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 91/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
91
Figure2.18DistributionofIQScores
Since68%oftheIQscoresliewithintheintervalfrom90to110,itmustbethecasethat32%
lieoutsidethatinterval.Bysymmetryapproximatelyhalfofthat32%,or16%ofallIQscores,willlieabove
110.If16%lieabove110,then84%liebelow.WeconcludethattheIQscore110isthe84thpercentile.
Thesameanalysisappliestothescore120.Sinceapproximately95%ofallIQscoresliewithintheinterval
form80to120,only5%lieoutsideit,andhalfofthem,or2.5%ofallscores,areabove120.TheIQscore
120isthushigherthan97.5%ofallIQscores,andisquiteahighscore.
Byasimilarargument,only15/100of1%ofalladults,oraboutoneortwoineverythousand,wouldhave
anIQscoreabove130.Thisfactmakesthescore130extremelyhigh.
Chebyshev’sTheorem
The Empirical Rule does not apply to all data sets, only to those that are bell-shaped, and even then is
stated in terms of approximations. A result that applies to every data set is known as Chebyshev’s
Theorem.
Chebyshev’sTheorem
For any numerical data set,
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 92/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
92
1. at least 3/4 of the data lie within two standard deviations of the mean, that is, in the interval with
endpoints x^ −±2 s for samples and with endpoints µ±2σ for populations;
2. at least 8/9 of the data lie within three standard deviations of the mean, that is, in the interval with
endpoints x^ −±3 s for samples and with endpoints µ±3σ for populations;
3. at least 1−1/k 2 of the data lie within k standard deviations of the mean, that is, in the interval with
endpoints x^ −±ks for samples and with endpoints µ±k σ for populations, where k is any positive whole
number that is greater than 1.
Figure 2.19 "Chebyshev’s Theorem" gives a visual illustration of Chebyshev’s Theorem.
igure 2.19 Chebyshev’s Theorem
It is important to pay careful attention to the words “at least” at the beginning of each of the three parts.
The theorem gives the minimum proportion of the data which must lie within a given number of standard
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 93/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
93
deviations of the mean; the true proportions found within the indicated regions could be greater than
what the theorem guarantees.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 94/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
94
E X A M P L E 2 2
Thenumberofvehiclespassingthroughabusyintersectionbetween8:00a.m.and10:00a.m.was
observedandrecordedoneveryweekdaymorningofthelastyear.Thedatasetcontains n=251
numbers.Thesamplemeanis x^ −=725andthesamplestandarddeviationis s=25.Identifywhichof
thefollowingstatementsmust betrue.
1. Onapproximately95%oftheweekdaymorningslastyearthenumberofvehiclespassingthroughthe
intersectionfrom8:00a.m.to10:00a.m.wasbetween675and775.
2. Onatleast75%oftheweekdaymorningslastyearthenumberofvehiclespassingthroughthe
intersectionfrom8:00a.m.to10:00a.m.wasbetween675and775.
3. Onatleast189weekdaymorningslastyearthenumberofvehiclespassingthroughtheintersectionfrom
8:00a.m.to10:00a.m.wasbetween675and775.
4. Onatmost25%oftheweekdaymorningslastyearthenumberofvehiclespassingthroughthe
intersectionfrom8:00a.m.to10:00a.m.waseitherlessthan675orgreaterthan775.
5. Onatmost12.5%oftheweekdaymorningslastyearthenumberofvehiclespassingthroughthe
intersectionfrom8:00a.m.to10:00a.m.waslessthan675.
6. Onatmost25%oftheweekdaymorningslastyearthenumberofvehiclespassingthroughthe
intersectionfrom8:00a.m.to10:00a.m.waslessthan675.
Solution:
1. Sinceitisnotstatedthattherelativefrequencyhistogramofthedataisbell-shaped,theEmpiricalRule
doesnotapply.Statement(1)isbasedontheEmpiricalRuleandthereforeitmightnotbecorrect.
2. Statement(2)isadirectapplicationofpart(1)ofChebyshev’sTheorembecause( x^ −−2 s, x^ −+2 s)=(675,775).It
mustbecorrect.
3. Statement(3)saysthesamethingasstatement(2)because75%of251is188.25,sotheminimumwhole
numberofobservationsinthisintervalis189.Thusstatement(3)isdefinitelycorrect.
4. Statement(4)saysthesamethingasstatement(2)butindifferentwords,andthereforeisdefinitely
correct.
5. Statement(4),whichisdefinitelycorrect,statesthatatmost25%ofthetimeeitherfewerthan675or
morethan775vehiclespassedthroughtheintersection.Statement(5)saysthathalfofthat25%
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 95/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
95
correspondstodaysoflighttraffic.Thiswouldbecorrectiftherelativefrequencyhistogramofthedata
wereknowntobesymmetric.Butthisisnotstated;perhapsalloftheobservationsoutsidetheinterval
(675,775)arelessthan75.Thusstatement(5)mightnotbecorrect
6. Statement(4)isdefinitelycorrectandstatement(4)impliesstatement(6):evenifeverymeasurement
thatisoutsidetheinterval(675,775)islessthan675(whichisconceivable,sincesymmetryisnotknownto
hold),evensoatmost25%ofallobservationsarelessthan675.Thusstatement(6)mustdefinitelybe
correct.
K E Y T A K E A W A Y S
• TheEmpiricalRuleisanapproximationthatappliesonlytodatasetswithabell-shapedrelativefrequency
histogram.Itestimatestheproportionofthemeasurementsthatliewithinone,two,andthreestandard
deviationsofthemean.
• Chebyshev’sTheoremisafactthatappliestoallpossibledatasets.Itdescribestheminimumproportion
ofthemeasurementsthatliemustwithinone,two,ormorestandarddeviationsofthemean.
E X E R C I S E S
B A S I C
1. StatetheEmpiricalRule.
2. DescribetheconditionsunderwhichtheEmpiricalRulemaybeapplied.
3. StateChebyshev’sTheorem.
4. DescribetheconditionsunderwhichChebyshev’sTheoremmaybeapplied.5. Asampledatasetwithabell-shapeddistributionhasmean x^ −=6andstandarddeviations=2.Findthe
approximateproportionofobservationsinthedatasetthatlie:
a. between4and8;
b. between2and10;
c. between0and12.
6. Apopulationdatasetwithabell-shapeddistributionhasmean μ=6andstandarddeviationσ =2.Findthe
approximateproportionofobservationsinthedatasetthatlie:
a. between4and8;
b. between2and10;
c. between0and12.
7. Apopulationdatasetwithabell-shapeddistributionhasmean μ=2andstandarddeviationσ =1.1.Findthe
approximateproportionofobservationsinthedatasetthatlie:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 96/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
96
a. above2;
b. above3.1;
c. between2and3.1.
8. Asampledatasetwithabell-shapeddistributionhasmean x−=2andstandarddeviations=1.1.Findthe
approximateproportionofobservationsinthedatasetthatlie:
a. below−0.2;
b. below3.1;
c. between−1.3and0.9.
9. Apopulationdatasetwithabell-shapeddistributionandsizeN=500hasmean μ=2andstandard
deviationσ =1.1.Findtheapproximatenumberofobservationsinthedatasetthatlie:
a. above2;
b. above3.1;
c. between2and3.1.
10. Asampledatasetwithabell-shapeddistributionandsizen=128hasmean x^ −=2andstandard
deviations=1.1.Findtheapproximatenumberofobservationsinthedatasetthatlie:
a. below−0.2;
b. below3.1;
c. between−1.3and0.9.
11. Asampledatasethasmean x^ −=6andstandarddeviations=2.Findtheminimumproportionof
observationsinthedatasetthatmustlie:
a. between2and10;
b. between0and12;
c. between4and8.
12. Apopulationdatasethasmean μ=2andstandarddeviationσ =1.1.Findtheminimumproportionof
observationsinthedatasetthatmustlie:
a. between−0.2and4.2;
b. between−1.3and5.3.
13. ApopulationdatasetofsizeN=500hasmean μ=5.2andstandarddeviationσ =1.1.Findtheminimum
numberofobservationsinthedatasetthatmustlie:
a. between3and7.4;
b. between1.9and8.5.
14. Asampledatasetofsizen=128hasmean x^ −=2andstandarddeviations=2.Findtheminimumnumber
ofobservationsinthedatasetthatmustlie:
a. between−2and6(including−2and6);
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 97/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
97
b. between−4and8(including−4and8).
15. Asampledatasetofsizen=30hasmean x^ −=6andstandarddeviations=2.
a. Whatisthemaximumproportionofobservationsinthedatasetthatcanlieoutsidetheinterval
(2,10)?
b. Whatcanbesaidabouttheproportionofobservationsinthedatasetthatarebelow2?
c. Whatcanbesaidabouttheproportionofobservationsinthedatasetthatareabove10?
d. Whatcanbesaidaboutthenumberofobservationsinthedatasetthatareabove10?
16. Apopulationdatasethasmean μ=2andstandarddeviationσ =1.1.
a. Whatisthemaximumproportionofobservationsinthedatasetthatcanlieoutsidethe
interval(−1.3,5.3)?
b. Whatcanbesaidabouttheproportionofobservationsinthedatasetthatarebelow−1.3?
c. Whatcanbesaidabouttheproportionofobservationsinthedatasetthatareabove5.3?
A P P L I C A T I O N S
17. Scoresonafinalexamtakenby1,200studentshaveabell-shapeddistributionwithmean72andstandard
deviation9.
a. Whatisthemedianscoreontheexam?
b. Abouthowmanystudentsscoredbetween63and81?
c. Abouthowmanystudentsscoredbetween72and90?
d. Abouthowmanystudentsscoredbelow54?
18. Lengthsoffishcaughtbyacommercialfishingboathaveabell-shapeddistributionwithmean23inchesand
standarddeviation1.5inches.
a. Aboutwhatproportionofallfishcaughtarebetween20inchesand26incheslong?
b. Aboutwhatproportionofallfishcaughtarebetween20inchesand23incheslong?
c. Abouthowlongisthelongestfishcaught(onlyasmallfractionofapercentarelonger)?
19. Hockeypucksusedinprofessionalhockeygamesmustweighbetween5.5and6ounces.Iftheweightof
pucksmanufacturedbyaparticularprocessisbell-shaped,hasmean5.75ouncesandstandarddeviation
0.125ounce,whatproportionofthepuckswillbeusableinprofessionalgames?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 98/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
98
20. Hockeypucksusedinprofessionalhockeygamesmustweighbetween5.5and6ounces.Iftheweightof
pucksmanufacturedbyaparticularprocessisbell-shapedandhasmean5.75ounces,howlargecanthe
standarddeviationbeif99.7%ofthepucksaretobeusableinprofessionalgames?
21. Speedsofvehiclesonasectionofhighwayhaveabell-shapeddistributionwithmean60mphand
standarddeviation2.5mph.
a. Ifthespeedlimitis55mph,aboutwhatproportionofvehiclesarespeeding?
b. Whatisthemedianspeedforvehiclesonthishighway?
c. Whatisthepercentilerankofthespeed65mph?
d. Whatspeedcorrespondstothe16thpercentile?
22. Supposethat,asinthepreviousexercise,speedsofvehiclesonasectionofhighwayhavemean60mph
andstandarddeviation2.5mph,butnowthedistributionofspeedsisunknown.
a. Ifthespeedlimitis55mph,atleastwhatproportionofvehiclesmustspeeding?
b. Whatcanbesaidabouttheproportionofvehiclesgoing65mphorfaster?
23. Aninstructorannouncestotheclassthatthescoresonarecentexamhadabell-shapeddistributionwith
mean75andstandarddeviation5.
a. Whatisthemedianscore?
b. Approximatelywhatproportionofstudentsintheclassscoredbetween70and80?
c. Approximatelywhatproportionofstudentsintheclassscoredabove85?
d. Whatisthepercentilerankofthescore85?
24. TheGPAsofallcurrentlyregisteredstudentsatalargeuniversityhaveabell-shapeddistributionwith
mean2.7andstandarddeviation0.6.StudentswithaGPAbelow1.5areplacedonacademicprobation.
Approximatelywhatpercentageofcurrentlyregisteredstudentsattheuniversityareonacademic
probation?
25. Thirty-sixstudentstookanexamonwhichtheaveragewas80andthestandarddeviationwas6.Arumor
saysthatfivestudentshadscores61orbelow.Cantherumorbetrue?Whyorwhynot?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 99/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
99
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 100/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
100
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 101/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
101
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 102/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
102
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 103/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
103
Chapter3
BasicConceptsofProbability
Suppose a polling organization questions 1,200 voters in order to estimate the proportion of all
voters who favor a particular bond issue. We would expect the proportion of the 1,200 voters in the
survey who are in favor to be close to the proportion of all voters who are in favor, but this need not
be true. There is a degree of randomness associated with the survey result. If the survey result is
highly likely to be close to the true proportion, then we have confidence in the survey result. If it is
not particularly likely to be close to the population proportion, then we would perhaps not take the
survey result too seriously. The likelihood that the survey proportion is close to the population
proportion determines our confidence in the survey result. For that reason, we would like to be able
to compute that likelihood. The task of computing it belongs to the realm of probability, which we
study in this chapter.
3.1SampleSpaces,Events,andTheirProbabilities
L E A R N I N G O B J E C T I V E S
1. Tolearntheconceptofthesamplespaceassociatedwitharandomexperiment.
2. Tolearntheconceptofaneventassociatedwitharandomexperiment.
3. Tolearntheconceptoftheprobabilityofanevent.
SampleSpacesandEvents
Rolling an ordinary six-sided die is a familiar example of a random experiment , an action for which all
possible outcomes can be listed, but for which the actual outcome on any given trial of the experiment
cannot be predicted with certainty. In such a situation we wish to assign to each outcome, such as rolling a
two, a number, called the probability of the outcome, that indicates how likely it is that the outcome will
occur. Similarly, we would like to assign a probability to any event , or collection of outcomes, such as
rolling an even number, which indicates how likely it is that the event will occur if the experiment is
performed. This section provides a framework for discussing probability problems, using the terms just
mentioned.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 104/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
104
Definition
A random experiment is a mechanism that produces a definite outcome that cannot be predicted
with certainty. The sample space associated with a random experiment is the set of all possible
outcomes. An event is a subset of the sample space.
Definition
An event E is said to occur on a particular trial of the experiment if the outcome observed is an element
of the set E .
E X A M P L E 1
Constructasamplespacefortheexperimentthatconsistsoftossingasinglecoin.
Solution:
Theoutcomescouldbelabeledhforheadsandt fortails.ThenthesamplespaceisthesetS ={h,t }.
E X A M P L E 2
Constructasamplespacefortheexperimentthatconsistsofrollingasingledie.Findtheeventsthat
correspondtothephrases“anevennumberisrolled”and“anumbergreaterthantwoisrolled.”
Solution:
Theoutcomescouldbelabeledaccordingtothenumberofdotsonthetopfaceofthedie.Thenthe
samplespaceistheset S ={1,2,3,4,5,6}.
Theoutcomesthatareevenare2,4,and6,sotheeventthatcorrespondstothephrase“anevennumber
isrolled”istheset{2,4,6},whichitisnaturaltodenotebytheletterE .Wewrite E ={2,4,6}.
Similarlytheeventthatcorrespondstothephrase“anumbergreaterthantwoisrolled”isthe
setT ={3,4,5,6},whichwehavedenotedT .
A graphical representation of a sample space and events is a Venn diagram, as shown in Figure
3.1 "Venn Diagrams for Two Sample Spaces" for Note 3.6 "Example 1" and Note 3.7 "Example 2".
In general the sample space S is represented by a rectangle, outcomes by points within the
rectangle, and events by ovals that enclose the outcomes that compose them.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 105/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
105
ure 3.1 Venn Diagrams for Two Sample Spaces
E X A M P L E 3
Arandomexperimentconsistsoftossingtwocoins.
a. Constructasamplespaceforthesituationthatthecoinsareindistinguishable,suchastwobrand
newpennies.
b. Constructasamplespaceforthesituationthatthecoinsaredistinguishable,suchasoneapennyandthe
otheranickel.
Solution:
a. Afterthecoinsaretossedoneseeseithertwoheads,whichcouldbelabeled2h,twotails,which
couldbelabeled2t ,orcoinsthatdiffer,whichcouldbelabeledd .Thusasamplespaceis S ={2h,2t ,d }.
b. Sincewecantellthecoinsapart,therearenowtwowaysforthecoinstodiffer:thepennyheadsandthe
nickeltails,orthepennytailsandthenickelheads.Wecanlabeleachoutcomeasapairofletters,thefirst
ofwhichindicateshowthepennylandedandthesecondofwhichindicateshowthenickellanded.A
samplespaceisthen S ′={hh,ht ,th,tt }.
A device that can be helpful in identifying all possible outcomes of a random experiment, particularly one
that can be viewed as proceeding in stages, is what is called a tree diagram. It is described in the
following example.
E X A M P L E 4
Constructasamplespacethatdescribesallthree-childfamiliesaccordingtothegendersofthe
childrenwithrespecttobirthorder.
Solution:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 106/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
106
Twooftheoutcomesare“twoboysthenagirl,”whichwemightdenote bbg ,and“agirlthentwo
boys,”whichwewoulddenote gbb.Clearlytherearemanyoutcomes,andwhenwetrytolistallof
themitcouldbedifficulttobesurethatwehavefoundthemallunlessweproceedsystematically.
ThetreediagramshowninFigure3.2"TreeDiagramForThree-ChildFamilies" ,givesasystematic
approach.
Figure3.2TreeDiagramForThree-ChildFamilies
Thediagramwasconstructedasfollows.Therearetwopossibilitiesforthefirstchild,boyorgirl,so
wedrawtwolinesegmentscomingoutofastartingpoint,oneendingina bfor“boy”andtheother
endinginagfor“girl.”Foreachofthesetwopossibilitiesforthefirstchildtherearetwopossibilities
forthesecondchild,“boy”or“girl,”sofromeachofthe bandgwedrawtwolinesegments,one
segmentendinginabandoneinag.Foreachofthefourendingpointsnowinthediagramthereare
twopossibilitiesforthethirdchild,sowerepeattheprocessoncemore.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 107/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
107
Thelinesegmentsarecalledbranchesofthetree.Therightendingpointofeachbranchiscalled
anode.Thenodesontheextremerightarethe finalnodes;toeachonetherecorrespondsan
outcome,asshowninthefigure.
Fromthetreeitiseasytoreadofftheeightoutcomesoftheexperiment,sothesamplespaceis,
readingfromthetoptothebottomofthefinalnodesinthetree,
S ={bbb,bbg ,bgb,bgg , gbb, gbg , ggb, ggg }
Probability
Definition
The probability of an outcome e in a sample space S is a number p between 0 and 1 that measures
the likelihood that e will occur on a single trial of the corresponding random experiment. The value p =
0 corresponds to the outcome e being impossible and the value p = 1 corresponds to the outcome e being
certain.
Definition
The probability of an event A is the sum of the probabilities of the individual outcomes of which it is
composed. It is denoted P ( A).
The following formula expresses the content of the definition of the probability of an event:
If an event E is E ={e1,e2,…,ek }, then
P ( E )= P (e1)+ P (e2)+ ⋅ ⋅ ⋅ + P (ek)
Figure 3.3 "Sample Spaces and Probability" graphically illustrates the definitions.
Figure 3.3 Sample Spaces and Probability
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 108/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
108
Since the whole sample space S is an event that is certain to occur, the sum of the probabilities of all
the outcomes must be the number 1.
In ordinary language probabilities are frequently expressed as percentages. For example, we would
say that there is a 70% chance of rain tomorrow, meaning that the probability of rain is 0.70. We will
use this practice here, but in all the computational formulas that follow we will use the form 0.70 and
not 70%.
E X A M P L E 5
Acoiniscalled“balanced”or“fair”ifeachsideisequallylikelytolandup.Assignaprobabilitytoeach
outcomeinthesamplespacefortheexperimentthatconsistsoftossingasinglefaircoin.
Solution:
Withtheoutcomeslabeledhforheadsandt fortails,thesamplespaceistheset S ={h,t }.Sincethe
outcomeshavethesameprobabilities,whichmustaddupto1,eachoutcomeisassignedprobability1/2.
E X A M P L E 6
Adieiscalled“balanced”or“fair”ifeachsideisequallylikelytolandontop.Assignaprobabilitytoeach
outcomeinthesamplespacefortheexperimentthatconsistsoftossingasinglefairdie.Findthe
probabilitiesoftheeventsE :“anevennumberisrolled”andT :“anumbergreaterthantwoisrolled.”
Solution:
Withoutcomeslabeledaccordingtothenumberofdotsonthetopfaceofthedie,thesamplespaceisthe
set S ={1,2,3,4,5,6}.Sincetherearesixequallylikelyoutcomes,whichmustaddupto1,eachisassigned
probability1/6.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 109/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
109
E X A M P L E 7
Twofaircoinsaretossed.Findtheprobabilitythatthecoinsmatch,i.e.,eitherbothlandheadsor
bothlandtails.
Solution:
InNote3.8"Example3"weconstructedthesamplespace S ={2h,2t ,d }forthesituationinwhichthe
coinsareidenticalandthesamplespace S ′={hh,ht ,th,tt }forthesituationinwhichthetwocoinscanbe
toldapart.
Thetheoryofprobabilitydoesnottellushow toassignprobabilitiestotheoutcomes,onlywhattodo
withthemoncetheyareassigned.Specifically,usingsamplespace S,matchingcoinsisthe
event M ={2h,2t },whichhasprobability P (2h)+ P (2t ).Usingsamplespace S ′,matchingcoinsisthe
event M ′={hh,tt },whichhasprobability P (hh)+ P (tt ).Inthephysicalworlditshouldmakenodifference
whetherthecoinsareidenticalornot,andsowewouldliketoassignprobabilitiestotheoutcomes
sothatthenumbers P ( M )and P ( M ′)arethesameandbestmatchwhatweobservewhenactual
physicalexperimentsareperformedwithcoinsthatseemtobefair.Actualexperiencesuggeststhat
theoutcomesin S ′ areequallylikely,soweassigntoeachprobability1∕4,andthen
P ( M ′)= P (hh)+ P (tt )=1/4+1/4=1/2
Similarly,fromexperienceappropriatechoicesfortheoutcomesin Sare:
P (2h)=1/4 P (2t )=1/4 P (d )=1/2
whichgivethesamefinalanswer
P ( M )= P (2h)+ P (2t )=1/4+1/4=1/2
The previous three examples illustrate how probabilities can be computed simply by counting
when the sample space consists of a finite number of equally likely outcomes. In some situations
the individual outcomes of any sample space that represents the experiment are unavoidably
unequally likely, in which case probabilities cannot be computed merely by counting, but the
computational formula given in the definition of the probability of an event must be used.
E X A M P L E 8
Thebreakdownofthestudentbodyinalocalhighschoolaccordingtoraceandethnicityis51%
white,27%black,11%Hispanic,6%Asian,and5%forallothers.Astudentisrandomlyselectedfrom
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 110/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 111/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
111
Nowthesamplespaceis S ={wm,bm,hm,am,om,wf ,bf ,hf ,af ,of }.Theinformationgivenintheexamplecanbe
summarizedinthefollowingtable,calleda two-waycontingencytable :
Gender
Race/Ethnicity
White Black Hispanic Asian Others
Male 0.25 0.12 0.06 0.03 0.01
Female 0.26 0.15 0.05 0.03 0.04
a. Since B={bm,bf }, P ( B)= P (bm)+ P (bf )=0.12+0.15=0.27.
b. Since MF ={bf ,hf ,af ,of },
P ( M )= P (bf )+ P (hf )+ P (af )+ P (of )=0.15+0.05+0.03+0.04=0.27
c. SinceFN ={wf ,hf ,af ,of },
P (FN )= P (wf )+ P (hf )+ P (af )+ P (of )=0.26+0.05+0.03+0.04=0.38
K E Y T A K E A W A Y S
• Thesamplespaceofarandomexperimentisthecollectionofallpossibleoutcomes.
• Aneventassociatedwitharandomexperimentisasubsetofthesamplespace.
• Theprobabilityofanyoutcomeisanumberbetween0and1.Theprobabilitiesofalltheoutcomesaddup
to1.
• Theprobabilityofanyevent Aisthesumoftheprobabilitiesoftheoutcomesin A.
E X E R C I S E S
B A S I C
1. Aboxcontains10whiteand10blackmarbles.Constructasamplespacefortheexperimentofrandomly
drawingout,withreplacement,twomarblesinsuccessionandnotingthecoloreachtime.(Todraw“with
replacement”meansthatthefirstmarbleisputbackbeforethesecondmarbleisdrawn.)
2. Aboxcontains16whiteand16blackmarbles.Constructasamplespacefortheexperimentofrandomly
drawingout,withreplacement,threemarblesinsuccessionandnotingthecoloreachtime.(Todraw“with
replacement”meansthateachmarbleisputbackbeforethenextmarbleisdrawn.)
3. Aboxcontains8red,8yellow,and8greenmarbles.Constructasamplespacefortheexperimentof
randomlydrawingout,withreplacement,twomarblesinsuccessionandnotingthecoloreachtime.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 112/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
112
4. Aboxcontains6red,6yellow,and6greenmarbles.Constructasamplespacefortheexperimentof
randomlydrawingout,withreplacement,threemarblesinsuccessionandnotingthecoloreachtime.
5. InthesituationofExercise1,listtheoutcomesthatcompriseeachofthefollowingevents.
a. Atleastonemarbleofeachcolorisdrawn.
b. Nowhitemarbleisdrawn.
6. InthesituationofExercise2,listtheoutcomesthatcompriseeachofthefollowingevents.
a. Atleastonemarbleofeachcolorisdrawn.
b. Nowhitemarbleisdrawn.
c. Moreblackthanwhitemarblesaredrawn.
7. InthesituationofExercise3,listtheoutcomesthatcompriseeachofthefollowingevents.
a. Noyellowmarbleisdrawn.
b. Thetwomarblesdrawnhavethesamecolor.
c. Atleastonemarbleofeachcolorisdrawn.
8. InthesituationofExercise4,listtheoutcomesthatcompriseeachofthefollowingevents.
a. Noyellowmarbleisdrawn.
b. Thethreemarblesdrawnhavethesamecolor.
c. Atleastonemarbleofeachcolorisdrawn.
9. Assumingthateachoutcomeisequallylikely,findtheprobabilityofeacheventinExercise5.
10. Assumingthateachoutcomeisequallylikely,findtheprobabilityofeacheventinExercise6.
11. Assumingthateachoutcomeisequallylikely,findtheprobabilityofeacheventinExercise7.
12. Assumingthateachoutcomeisequallylikely,findtheprobabilityofeacheventinExercise8.
13. Asamplespaceis S ={a,b,c,d ,e}.IdentifytwoeventsasU ={a,b,d }andV ={b,c,d }.Suppose P (a)and P (b)areeach0.2
and P (c)and P (d )areeach0.1.
a. Determinewhat P (e)mustbe.
b. Find P (U ).
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 113/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
113
c. Find P (V ).
14. Asamplespaceis S ={u,v,w, x }.Identifytwoeventsas A={v,w}and B={u,w, x }.Suppose P (u)=0.22, P (w)=0.36,and P ( x )=0.27.
a. Determinewhat P (v)mustbe.
b. Find P ( A).
c. Find P ( B).
A P P L I C A T I O N S
17. Thesamplespacethatdescribesallthree-childfamiliesaccordingtothegendersofthechildrenwithrespect
tobirthorderwasconstructedinNote3.9"Example4".Identifytheoutcomesthatcompriseeachofthe
followingeventsintheexperimentofselectingathree-childfamilyatrandom.
a. Atleastonechildisagirl.
b. Atmostonechildisagirl.
c. Allofthechildrenaregirls.
d. Exactlytwoofthechildrenaregirls.
e. Thefirstbornisagirl.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 114/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
114
18. ThesamplespacethatdescribesthreetossesofacoinisthesameastheoneconstructedinNote3.9
"Example4"with“boy”replacedby“heads”and“girl”replacedby“tails.”Identifytheoutcomesthat
compriseeachofthefollowingeventsintheexperimentoftossingacointhreetimes.
a. Thecoinlandsheadsmoreoftenthantails.
b. Thecoinlandsheadsthesamenumberoftimesasitlandstails.
c. Thecoinlandsheadsatleasttwice.
d. Thecoinlandsheadsonthelasttoss.
19. Assumingthattheoutcomesareequallylikely,findtheprobabilityofeacheventinExercise17.
20. Assumingthattheoutcomesareequallylikely,findtheprobabilityofeacheventinExercise18.
A D D I T I O N A L E X E R C I S E S
21. Thefollowingtwo-waycontingencytablegivesthebreakdownofthepopulationinaparticularlocale
accordingtoageandtobaccousage:
Age
Tobacco Use
Smoker Non-smoker
Under 30 0.05 0.20
Over 30 0.20 0.55
Apersonisselectedatrandom.Findtheprobabilityofeachofthefollowingevents.
a. Thepersonisasmoker.
b.
Thepersonisunder30.c. Thepersonisasmokerwhoisunder30.
22. Thefollowingtwo-waycontingencytablegivesthebreakdownofthepopulationinaparticularlocale
accordingtopartyaffiliation( A,B,C ,orNone)andopiniononabondissue:
Affiliation
Opinion
Favors Opposes Undecided
A 0.12 0.09 0.07
B 0.16 0.12 0.14
C 0.04 0.03 0.06
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 115/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
115
Affiliation
Opinion
Favors Opposes Undecided
None 0.08 0.06 0.03
Apersonisselectedatrandom.Findtheprobabilityofeachofthefollowingevents.
a. ThepersonisaffiliatedwithpartyB.
b. Thepersonisaffiliatedwithsomeparty.
c. Thepersonisinfavorofthebondissue.
d. Thepersonhasnopartyaffiliationandisundecidedaboutthebondissue.
23. Thefollowingtwo-waycontingencytablegivesthebreakdownofthepopulationofmarriedorpreviously
marriedwomenbeyondchild-bearingageinaparticularlocaleaccordingtoageatfirstmarriageandnumber
ofchildren:
Age
Number of Children
0 1 or 2 3 or More
Under 20 0.02 0.14 0.08
20–29 0.07 0.37 0.11
30 and above 0.10 0.10 0.01Awomanisselectedatrandom.Findtheprobabilityofeachofthefollowingevents.
a. Thewomanwasinhertwentiesatherfirstmarriage.
b. Thewomanwas20orolderatherfirstmarriage.
c. Thewomanhadnochildren.
d. Thewomanwasinhertwentiesatherfirstmarriageandhadatleastthreechildren.
e.
24. Thefollowingtwo-waycontingencytablegivesthebreakdownofthepopulationofadultsinaparticular
localeaccordingtohighestlevelofeducationandwhetherornottheindividualregularlytakesdietary
supplements:
Education Use of Supplements
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 116/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
116
Takes Does Not Take
No High School Diploma 0.04 0.06
High School Diploma 0.06 0.44
Undergraduate Degree 0.09 0.28
Graduate Degree 0.01 0.02
Anadultisselectedatrandom.Findtheprobabilityofeachofthefollowingevents.
a. Thepersonhasahighschooldiplomaandtakesdietarysupplementsregularly.
b. Thepersonhasanundergraduatedegreeandtakesdietarysupplementsregularly.
c. Thepersontakesdietarysupplementsregularly.
d. Thepersondoesnottakedietarysupplementsregularly.
L A R G E D A T A S E T E X E R C I S E S
25. LargeDataSets4and4Arecordtheresultsof500tossesofacoin.Findtherelativefrequencyofeach
outcome1,2,3,4,5,and6.Doesthecoinappeartobe“balanced”or“fair”?
http://www.flatworldknowledge.com/sites/all/files/data4.xls
http://www.flatworldknowledge.com/sites/all/files/data4A.xls
26. LargeDataSets6,6A,and6Brecordresultsofarandomsurveyof200votersineachoftworegions,inwhich
theywereaskedtoexpresswhethertheypreferCandidate AforaU.S.Senateseatorprefersomeother
candidate.
a. Findtheprobabilitythatarandomlyselectedvoteramongthese400prefersCandidate A.
b. Findtheprobabilitythatarandomlyselectedvoteramongthe200wholiveinRegion1prefers
Candidate A(separatelyrecordedinLargeDataSet6A).
c. Findtheprobabilitythatarandomlyselectedvoteramongthe200wholiveinRegion2prefers
Candidate A(separatelyrecordedinLargeDataSet6B).
http://www.flatworldknowledge.com/sites/all/files/data6.xls
http://www.flatworldknowledge.com/sites/all/files/data6A.xls
http://www.flatworldknowledge.com/sites/all/files/data6B.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 117/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
117
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 118/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
118
3.2Complements,Intersections,andUnions
L E A R N I N G O B J E C T I V E S
1. Tolearnhowsomeeventsarenaturallyexpressibleintermsofotherevents.
2. Tolearnhowtousespecialformulasfortheprobabilityofaneventthatisexpressedintermsofoneor
moreotherevents.
Some events can be naturally expressed in terms of other, sometimes simpler, events.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 119/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
119
Complements
Definition
The complement of an event A in a sample space S , denoted Ac, is the collection of all outcomes
in S that are not elements of the set A. It corresponds to negating any description in words of theevent A.
E X A M P L E 1 0
TwoeventsconnectedwiththeexperimentofrollingasingledieareE :“thenumberrollediseven”andT :
“thenumberrolledisgreaterthantwo.”Findthecomplementofeach.
Solution:
Inthesamplespace S ={1,2,3,4,5,6}thecorrespondingsetsofoutcomesare E ={2,4,6}andT ={3,4,5,6}.The
complementsare E c={1,3,5}andT c={1,2}.
Inwordsthecomplementsaredescribedby“thenumberrolledisnoteven”and“thenumberrolledisnot
greaterthantwo.”Ofcourseeasierdescriptionswouldbe“thenumberrolledisodd”and“thenumber
rolledislessthanthree.”
If there is a 60% chance of rain tomorrow, what is the probability of fair weather? The obvious
answer, 40%, is an instance of the following general rule.
ProbabilityRuleforComplements
P ( Ac)=1− P ( A)
This formula is particularly useful when finding the probability of an event
E X A M P L E 1 1
Findtheprobabilitythatatleastoneheadswillappearinfivetossesofafaircoin.
Solution:
Identifyoutcomesbylistsoffivehsandt s,suchastthtt andhhttt .Althoughitistedioustolistthemall,it
isnotdifficulttocountthem.Thinkofusingatreediagramtodoso.Therearetwochoicesforthe
firsttoss.Foreachofthesetherearetwochoicesforthesecondtoss,hence 2×2=4outcomesfortwo
tosses.Foreachofthesefouroutcomes,therearetwopossibilitiesforthethirdtoss,
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 120/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
120
hence4×2=8outcomesforthreetosses.Similarly,thereare 8×2=16outcomesforfourtossesand
finally16×2=32outcomesforfivetosses.
LetOdenotetheevent“atleastoneheads.”Therearemanywaystoobtainatleastoneheads,butonly
onewaytofailtodoso:alltails.Thusalthoughitisdifficulttolistalltheoutcomesthatform O,itiseasy
towriteOc={ttttt }.Sincethereare32equallylikelyoutcomes,eachhasprobability1/32,so P (Oc)=1/32,
hence P (O)=1−1/32≈0.97orabouta97%chance.
IntersectionofEvents
DefinitionThe intersection of events A and B, denoted A ∩ B, is the collection of all outcomes that are elements
of both of the sets A and B. It corresponds to combining descriptions of the two events using the word
“and.”
To say that the event A ∩ B occurred means that on a particular trial of the experiment
both A and B occurred. A visual representation of the intersection of events A and B in a sample
space S is given in Figure 3.4 "The Intersection of Events ". The intersection corresponds to theshaded lens-shaped region that lies within both ovals.
Figure 3.4 The Intersection of Events Aand B
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 121/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
121
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 122/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
122
Definition Events A and B are mutually exclusive if they have no elements in common.
For A and B to have no outcomes in common means precisely that it is impossible for both A and B tooccur on a single trial of the random experiment. This gives the following rule.
ProbabilityRuleforMutuallyExclusiveEvents
Events A and B are mutually exclusive if and only if
P ( A∩ B)=0
Any event A and its complement Ac are mutually exclusive, but A and B can be mutually exclusive without
being complements.
E X A M P L E 1 4
Intheexperimentofrollingasingledie,findthreechoicesforanevent Asothattheevents AandE :“the
numberrollediseven”aremutuallyexclusive.
Solution:
Since E ={2,4,6}andwewant AtohavenoelementsincommonwithE ,anyeventthatdoesnotcontainany
evennumberwilldo.Threechoicesare{1,3,5}(thecomplementE c,theodds),{1,3},and{5}.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 123/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
123
UnionofEventsDefinitionThe union of events A and B, denoted A ∪ B, is the collection of all outcomes that are elements of one
or the other of the sets A and B, or of both of them. It corresponds to combining descriptions of the two
events using the word “or.”
To say that the event A ∪ B occurred means that on a particular trial of the experiment
either A or B occurred (or both did). A visual representation of the union of events A and B in a sample
space S is given in Figure 3.5 "The Union of Events ". The union corresponds to the shaded region.
Figure 3.5 The Union of Events A and B
E X A M P L E 1 5
Intheexperimentofrollingasingledie,findtheunionoftheeventsE :“thenumberrollediseven”andT :
“thenumberrolledisgreaterthantwo.”
Solution:
Sincetheoutcomesthatareineither E ={2,4,6}orT ={3,4,5,6}(orboth)are2,3,4,5,and6, E ∪T ={2,3,4,5,6}.Note
thatanoutcomesuchas4thatisinbothsetsisstilllistedonlyonce(althoughstrictlyspeakingitisnot
incorrecttolistittwice).
Inwordstheunionisdescribedby“thenumberrolledisevenorisgreaterthantwo.”Everynumber
betweenoneandsixexceptthenumberoneiseitherevenorisgreaterthantwo,corresponding
toE ∪T givenabove.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 124/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
124
E X A M P L E 1 6
Atwo-childfamilyisselectedatrandom.LetBdenotetheeventthatatleastonechildisaboy,
letDdenotetheeventthatthegendersofthetwochildrendiffer,andletMdenotetheeventthatthe
gendersofthetwochildrenmatch.FindB∪Dand B ∪ M .
Solution:
Asamplespaceforthisexperimentis S ={bb,bg , gb, gg },wherethefirstletterdenotesthegenderofthe
firstbornchildandthesecondletterdenotesthegenderofthesecondchild.Theevents B,D,
andMare
B={bb,bg , gb} D={bg , gb} M ={bb, gg }
EachoutcomeinDisalreadyinB,sotheoutcomesthatareinatleastoneortheotherofthe
setsBandDisjustthesetBitself: B∪ D={bb,bg , gb}= B.
EveryoutcomeinthewholesamplespaceSisinatleastoneortheotherofthesetsBandM,so B ∪
M ={bb,bg , gb, gg }= S .
The following Additive Rule of Probability is a useful formula for calculating the probability
of A∪ B.
AdditiveRuleofProbability
P ( A ∪ B)= P ( A)+ P ( B)− P ( A ∩ B)
The next example, in which we compute the probability of a union both by counting and by using the
formula, shows why the last term in the formula is needed.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 125/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
125
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 126/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
126
E X A M P L E 1 8
Atutoringservicespecializesinpreparingadultsforhighschoolequivalencetests.Amongallthestudents
seekinghelpfromtheservice,63%needhelpinmathematics,34%needhelpinEnglish,and27%need
helpinbothmathematicsandEnglish.Whatisthepercentageofstudentswhoneedhelpineither
mathematicsorEnglish?
Solution:
Imagineselectingastudentatrandom,thatis,insuchawaythateverystudenthasthesamechanceof
beingselected.LetMdenotetheevent“thestudentneedshelpinmathematics”andletE denotethe
event“thestudentneedshelpinEnglish.”Theinformationgivenisthat P ( M )=0.63, P ( E )=0.34,
and P ( M ∩ E )=0.27.TheAdditiveRuleofProbabilitygives
P ( M ∪ E )= P ( M )+ P ( E )− P ( M ∩ E )=0.63+0.34−0.27=0.70
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 127/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
127
Note how the naïve reasoning that if 63% need help in mathematics and 34% need help in English
then 63 plus 34 or 97% need help in one or the other gives a number that is too large. The percentage
that need help in both subjects must be subtracted off, else the people needing help in both are
counted twice, once for needing help in mathematics and once again for needing help in English. The
simple sum of the probabilities would work if the events in question were mutually exclusive, for
then P ( A ∩ B) is zero, and makes no difference.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 128/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
128
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 129/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
129
K E Y T A K E A W A Y
• Theprobabilityofaneventthatisacomplementorunionofeventsofknownprobabilitycanbecomputed
usingformulas.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 130/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
130
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 131/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
131
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 132/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
132
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 133/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
133
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 134/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
134
R S T
M 0.09 0.25 0.19
N 0.31 0.16 0.00
a. P ( R), P ( S ), P ( R∩ S ).
b. P ( M ), P ( N ), P ( M ∩ N ).
c. P ( R∪ S ).
d. P ( Rc).
e. DeterminewhetherornottheeventsNandSaremutuallyexclusive;theeventsNandT .
A P P L I C A T I O N S
11. MakeastatementinordinaryEnglishthatdescribesthecomplementofeachevent(donotsimplyinsertthe
word“not”).
a. Intherollofadie:“fiveormore.”
b. Inarollofadie:“anevennumber.”
c. Intwotossesofacoin:“atleastoneheads.”
d. Intherandomselectionofacollegestudent:“Notafreshman.”
12. MakeastatementinordinaryEnglishthatdescribesthecomplementofeachevent(donotsimplyinsertthe
word“not”).
a. Intherollofadie:“twoorless.”
b. Intherollofadie:“one,three,orfour.”
c. Intwotossesofacoin:“atmostoneheads.”
d. Intherandomselectionofacollegestudent:“Neitherafreshmannorasenior.”
13. Thesamplespacethatdescribesallthree-childfamiliesaccordingtothegendersofthechildrenwithrespect
tobirthorderis
S ={bbb,bbg ,bgb,bgg , gbb, gbg , ggb, ggg }.
Foreachofthefollowingeventsintheexperimentofselectingathree-childfamilyatrandom,statethe
complementoftheeventinthesimplestpossibleterms,thenfindtheoutcomesthatcomprisetheeventand
itscomplement.
a. Atleastonechildisagirl.
b. Atmostonechildisagirl.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 135/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
135
c. Allofthechildrenaregirls.
d. Exactlytwoofthechildrenaregirls.
e. Thefirstbornisagirl.
14. Thesamplespacethatdescribesthetwo-wayclassificationofcitizensaccordingtogenderandopinionon
apoliticalissueis
S ={mf ,ma,mn, ff , fa, fn},
wherethefirstletterdenotesgender(m:male, f :female)andthesecondopinion( f :for,a:against,n:
neutral).Foreachofthefollowingeventsintheexperimentofselectingacitizenatrandom,statethe
complementoftheeventinthesimplestpossibleterms,thenfindtheoutcomesthatcomprisetheeventand
itscomplement.
a. Thepersonismale.
b. Thepersonisnotinfavor.
c. Thepersoniseithermaleorinfavor.
d. Thepersonisfemaleandneutral.
15. AtouristwhospeaksEnglishandGermanbutnootherlanguagevisitsaregionofSlovenia.If35%ofthe
residentsspeakEnglish,15%speakGerman,and3%speakbothEnglishandGerman,whatistheprobability
thatthetouristwillbeabletotalkwitharandomlyencounteredresidentoftheregion?
16. Inacertaincountry43%ofallautomobileshaveairbags,27%haveanti-lockbrakes,and13%haveboth.
Whatistheprobabilitythatarandomlyselectedvehiclewillhavebothairbagsandanti-lockbrakes?
17. Amanufacturerexaminesitsrecordsoverthelastyearonacomponentpartreceivedfromoutside
suppliers.Thebreakdownonsource(supplier A,supplierB)andquality(H:high,U:usable,D:defective)is
showninthetwo-waycontingencytable.
H U D
A 0.6937 0.0049 0.0014
B 0.2982 0.0009 0.0009
Therecordofapartisselectedatrandom.Findtheprobabilityofeachofthefollowingevents.
a. Thepartwasdefective.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 136/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
136
b. Thepartwaseitherofhighqualityorwasatleastusable,intwoways:(i)byaddingnumbersinthetable,and(ii)
usingtheanswerto(a)andtheProbabilityRuleforComplements.
c. ThepartwasdefectiveandcamefromsupplierB.
d. ThepartwasdefectiveorcamefromsupplierB,intwoways:byfindingthecellsinthetablethatcorrespondto
thiseventandaddingtheirprobabilities,and(ii)usingtheAdditiveRuleofProbability.
18.Individualswithaparticularmedicalconditionwereclassifiedaccordingtothepresence(T )orabsence(N)ofa
potentialtoxinintheirbloodandtheonsetofthecondition(E :early,M:midrange,L:late).Thebreakdown
accordingtothisclassificationisshowninthetwo-waycontingencytable.
E M L
T 0.012 0.124 0.013
N 0.170 0.638 0.043
Oneoftheseindividualsisselectedatrandom.Findtheprobabilityofeachofthefollowingevents.
a. Thepersonexperiencedearlyonsetofthecondition.
b. Theonsetoftheconditionwaseithermidrangeorlate,intwoways:(i)byaddingnumbersinthe
table,and(ii)usingtheanswerto(a)andtheProbabilityRuleforComplements.
c. Thetoxinispresentintheperson’sblood.
d. Thepersonexperiencedearlyonsetoftheconditionandthetoxinispresentintheperson’s
blood.
e. Thepersonexperiencedearlyonsetoftheconditionorthetoxinispresentintheperson’sblood,
intwoways:(i)byfindingthecellsinthetablethatcorrespondtothiseventandaddingtheir
probabilities,and(ii)usingtheAdditiveRuleofProbability.
19. Thebreakdownofthestudentsenrolledinauniversitycoursebyclass(F :freshman, So:sophomore, J:
junior, Se:senior)andacademicmajor(S:science,mathematics,orengineering,L:liberalarts,O:other)is
showninthetwo-wayclassificationtable.
Major
Class
F So J Se
S 92 42 20 13
L 368 167 80 53
O 460 209 100 67
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 137/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
137
Astudentenrolledinthecourseisselectedatrandom.Adjointherowandcolumntotalstothetableand
usetheexpandedtabletofindtheprobabilityofeachofthefollowingevents.
a. Thestudentisafreshman.
b. Thestudentisaliberalartsmajor.
c. Thestudentisafreshmanliberalartsmajor.
d. Thestudentiseitherafreshmanoraliberalartsmajor.
e. Thestudentisnotaliberalartsmajor.
20. Thetablerelatestheresponsetoafund-raisingappealbyacollegetoitsalumnitothenumberofyears
sincegraduation.
Response
Years Since Graduation
0–5 6–20 21–35 Over 35
Positive 120 440 210 90
None 1380 3560 3290 910
Analumnusisselectedatrandom.Adjointherowandcolumntotalstothetableandusetheexpanded
tabletofindtheprobabilityofeachofthefollowingevents.
a. Thealumnusresponded.
b. Thealumnusdidnotrespond.
c. Thealumnusgraduatedatleast21yearsago.
d. Thealumnusgraduatedatleast21yearsagoandresponded.
A D D I T I O N A L E X E R C I S E S
21. Thesamplespacefortossingthreecoinsis
S ={hhh,hht ,hth,htt ,thh,tht ,tth,ttt }
a. Listtheoutcomesthatcorrespondtothestatement“Allthecoinsareheads.”
b. Listtheoutcomesthatcorrespondtothestatement“Notallthecoinsareheads.”
c. Listtheoutcomesthatcorrespondtothestatement“Allthecoinsarenotheads.”
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 138/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
138
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 139/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
139
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 140/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
140
3.3ConditionalProbabilityandIndependentEvents
L E A R N I N G O B J E C T I V E S
1. Tolearntheconceptofaconditionalprobabilityandhowtocomputeit.
2. Tolearntheconceptofindependenceofevents,andhowtoapplyit.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 141/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
141
ConditionalProbabilitySuppose a fair die has been rolled and you are asked to give the probability that it was a five. There are six
equally likely outcomes, so your answer is 1/6. But suppose that before you give your answer you are given
the extra information that the number rolled was odd. Since there are only three odd numbers that are
possible, one of which is five, you would certainly revise your estimate of the likelihood that a five wasrolled from 1/6 to 1/3. In general, the revised probability that an event A has occurred, taking into
account the additional information that another event B has definitely occurred on this trial of the
experiment, is called the conditional probability of A given B and is denoted by P ( A| B). The reasoning
employed in this example can be generalized to yield the computational formula in the following
definition.
Definition
The conditional probability of A given B, denoted P ( A| B), is the probability that event A has occurred
in a trial of a random experiment for which it is known that event B has definitely occurred. It may be
computed by means of the following formula:
Rule for Conditional Probability
P ( A| B)= P ( A∩ B)/ P ( B)
E X A M P L E 2 0
Afairdieisrolled.
a. Findtheprobabilitythatthenumberrolledisafive,giventhatitisodd.
b. Findtheprobabilitythatthenumberrolledisodd,giventhatitisafive.
Solution:
Thesamplespaceforthisexperimentistheset S ={1,2,3,4,5,6}consistingofsixequallylikelyoutcomes.
LetF denotetheevent“afiveisrolled”andletOdenotetheevent“anoddnumberisrolled,”sothat
F ={5} and O={1,3,5}
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 142/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
142
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 143/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
143
Just as we did not need the computational formula in this example, we do not need it when the
information is presented in a two-way classification table, as in the next example.
E X A M P L E 2 1
Inasampleof902individualsunder40whowereorhadpreviouslybeenmarried,eachpersonwas
classifiedaccordingtogenderandageatfirstmarriage.Theresultsaresummarizedinthefollowingtwo-
wayclassificationtable,wherethemeaningofthelabelsis:
• M:male
• F :female
• E :ateenagerwhenfirstmarried
• W :inone’stwentieswhenfirstmarried
• H:inone’sthirtieswhenfirstmarried
E W H Total
M 43 293 114 450
F 82 299 71 452
Total 125 592 185 902
Thenumbersinthefirstrowmeanthat43peopleinthesampleweremenwhowerefirstmarriedintheir
teens,293weremenwhowerefirstmarriedintheirtwenties,114menwhowerefirstmarriedintheir
thirties,andatotalof450peopleinthesampleweremen.Similarlyforthenumbersinthesecondrow.
Thenumbersinthelastrowmeanthat,irrespectiveofgender,125peopleinthesampleweremarriedin
theirteens,592intheirtwenties,185intheirthirties,andthattherewere902peopleinthesampleinall.
Supposethattheproportionsinthesampleaccuratelyreflectthoseinthepopulationofallindividualsin
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 144/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
144
thepopulationwhoareunder40andwhoareorhavepreviouslybeenmarried.Supposesuchapersonis
selectedatrandom.
a. Findtheprobabilitythattheindividualselectedwasateenageratfirstmarriage.
b. Findtheprobabilitythattheindividualselectedwasateenageratfirstmarriage,giventhatthe
personismale.
Solution:
ItisnaturaltoletE alsodenotetheeventthatthepersonselectedwasateenageratfirstmarriageandto
letMdenotetheeventthatthepersonselectedismale.
a. Accordingtothetabletheproportionofindividualsinthesamplewhowereintheirteensattheir
firstmarriageis125/902.Thisistherelativefrequencyofsuchpeopleinthepopulation,
hence P ( E )=125/902≈0.139orabout14%.
Sinceitisknownthatthepersonselectedismale,allthefemalesmayberemovedfrom
consideration,sothatonlytherowinthetablecorrespondingtomeninthesampleapplies:
E W H Total
M 43 293 114 450
Theproportionofmalesinthesamplewhowereintheirteensattheirfirstmarriageis43/450.Thisisthe
relativefrequencyofsuchpeopleinthepopulationofmales,hence P ( E | M )=43/450≈0.096orabout10%.
In the next example, the computational formula in the definition must be used.
E X A M P L E 2 2
Supposethatinanadultpopulationtheproportionofpeoplewhoarebothoverweightandsuffer
hypertensionis0.09;theproportionofpeoplewhoarenotoverweightbutsufferhypertensionis
0.11;theproportionofpeoplewhoareoverweightbutdonotsufferhypertensionis0.02;andthe
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 145/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
145
proportionofpeoplewhoareneitheroverweightnorsufferhypertensionis0.78.Anadultis
randomlyselectedfromthispopulation.
a. Findtheprobabilitythatthepersonselectedsuffershypertensiongiventhatheisoverweight.
b. Findtheprobabilitythattheselectedpersonsuffershypertensiongiventhatheisnotoverweight.
c. Comparethetwoprobabilitiesjustfoundtogiveananswertothequestionastowhetheroverweight
peopletendtosufferfromhypertension.
Solution:
LetHdenotetheevent“thepersonselectedsuffershypertension.”Let Odenotetheevent“the
personselectedisoverweight.”Theprobabilityinformationgivenintheproblemmaybeorganized
intothefollowingcontingencytable:
O Oc
H 0.09 0.11
H c 0.02 0.78
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 146/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
146
IndependentEvents
Although typically we expect the conditional probability P ( A| B) to be different from the
probability P ( A) of A, it does not have to be different from P ( A). When P ( A| B)= P ( A), the occurrence
of B has no effect on the likelihood of A. Whether or not the event A has occurred is independent of
the event B.
Using algebra it can be shown that the equality P ( A| B)= P ( A) holds if and only if the equality P ( A ∩
B)= P ( A)⋅ P ( B) holds, which in turn is true if and only if P ( B| A)= P ( B). This is the basis for the following
definition.
Definition
Events A and B are independent if
P ( A∩ B)= P ( A)⋅ P ( B)
If A and B are not independent then they are dependent.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 147/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
147
The formula in the definition has two practical but exactly opposite uses:
1. In a situation in which we can compute all three probabilities P ( A), P ( B), and P ( A∩ B), it is used to check
whether or not the events A and B are independent:
o If P ( A∩ B)= P ( A)⋅ P ( B), then A and B are independent.
o If P ( A∩ B)≠ P ( A)⋅ P ( B), then A and B are not independent.
2. In a situation in which each of P ( A) and P ( B) can be computed and it is known that A and B are
independent, then we can compute P ( A∩ B) by multiplying together P ( A) and P ( B): P ( A∩ B)= P ( A)⋅ P ( B).
E X A M P L E 2 3
Asinglefairdieisrolled.Let A={3}and B={1,3,5}.Are AandBindependent?
Solution:
Inthisexamplewecancomputeallthreeprobabilities P ( A)=1/6, P ( B)=1/2,and P ( A ∩ B)= P ({3})=1/6.Sincethe
product P ( A)⋅ P ( B)=(1/6)(1/2)=1/12isnotthesamenumberas P ( A ∩ B)=1/6,theevents AandBarenot
independent.
E X A M P L E 2 4
Thetwo-wayclassificationofmarriedorpreviouslymarriedadultsunder40accordingtogenderandage
atfirstmarriageinNote3.48"Example21"producedthetable
E W H Total
M 43 293 114 450
F 82 299 71 452
Total 125 592 185 902
DeterminewhetherornottheeventsF :“female”andE :“wasateenageratfirstmarriage”are
independent.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 148/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
148
E X A M P L E 2 5
Manydiagnostictestsfordetectingdiseasesdonottestforthediseasedirectlybutforachemicalor
biologicalproductofthedisease,hencearenotperfectlyreliable.The sensitivity ofatestisthe
probabilitythatthetestwillbepositivewhenadministeredtoapersonwhohasthedisease.The
higherthesensitivity,thegreaterthedetectionrateandthelowerthefalsenegativerate.
Supposethesensitivityofadiagnosticproceduretotestwhetherapersonhasaparticulardiseaseis
92%.Apersonwhoactuallyhasthediseaseistestedforitusingthisprocedurebytwoindependent
laboratories.
a. Whatistheprobabilitythatbothtestresultswillbepositive?
b. Whatistheprobabilitythatatleastoneofthetwotestresultswillbepositive?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 149/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
149
Solution:
a. Let A1denotetheevent“thetestbythefirstlaboratoryispositive”andlet A2denotetheevent
“thetestbythesecondlaboratoryispositive.”Since A1and A2areindependent,
P ( A1 ∩ A2)= P ( A1)⋅ P ( A2)=0.92×0.92=0.8464
b. UsingtheAdditiveRuleforProbabilityandtheprobabilityjustcomputed,
P ( A1 ∪ A2)= P ( A1)+ P ( A2)− P ( A1 ∩ A2)=0.92+0.92−0.8464=0.9936
E X A M P L E 2 6
Thespecificity ofadiagnostictestforadiseaseistheprobabilitythatthetestwillbenegativewhen
administeredtoapersonwhodoesnothavethedisease.Thehigherthespecificity,thelowerthefalse
positiverate.
Supposethespecificityofadiagnosticproceduretotestwhetherapersonhasaparticulardiseaseis89%.
a. Apersonwhodoesnothavethediseaseistestedforitusingthisprocedure.Whatistheprobability
thatthetestresultwillbepositive?
b. Apersonwhodoesnothavethediseaseistestedforitbytwoindependentlaboratoriesusingthis
procedure.Whatistheprobabilitythatbothtestresultswillbepositive?
Solution:
a. LetBdenotetheevent“thetestresultispositive.”ThecomplementofBisthatthetestresultis
negative,andhasprobabilitythespecificityofthetest,0.89.Thus
P ( B)=1− P ( Bc)=1−0.89=0.11.
b. LetB1denotetheevent“thetestbythefirstlaboratoryispositive”andletB2denotetheevent
“thetestbythesecondlaboratoryispositive.”SinceB1andB2areindependent,bypart(a)oftheexample
P ( B1 ∩ B2)= P ( B1)⋅ P ( B2)=0.11×0.11=0.0121.
The concept of independence applies to any number of events. For example, three events A, B,
and C are independent if P ( A ∩ B∩ C )= P ( A)⋅ P ( B)⋅ P (C ). Note carefully that, as is the case with just two
events, this is not a formula that is always valid, but holds precisely when the events in question
are independent.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 150/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
150
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 151/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
151
ProbabilitiesonTreeDiagrams
Some probability problems are made much simpler when approached using a tree diagram. The next
example illustrates how to place probabilities on a tree diagram and use it to solve a problem.
E X A M P L E 2 8
Ajarcontains10marbles,7blackand3white.Twomarblesaredrawnwithoutreplacement,whichmeans
thatthefirstoneisnotputbackbeforethesecondoneisdrawn.
a. Whatistheprobabilitythatbothmarblesareblack?
b. Whatistheprobabilitythatexactlyonemarbleisblack?
c. Whatistheprobabilitythatatleastonemarbleisblack?
Solution:
Atreediagramforthesituationofdrawingonemarbleaftertheotherwithoutreplacementisshown
inFigure3.6"TreeDiagramforDrawingTwoMarbles".Thecircleandrectanglewillbeexplainedlater,and
shouldbeignoredfornow.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 152/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
152
Figure3.6TreeDiagramforDrawingTwoMarbles
Thenumbersonthetwoleftmostbranchesaretheprobabilitiesofgettingeitherablackmarble,7
outof10,orawhitemarble,3outof10,onthefirstdraw.Thenumberoneachremainingbranchis
theprobabilityoftheeventcorrespondingtothenodeontherightendofthebranchoccurring,
giventhattheeventcorrespondingtothenodeontheleftendofthebranchhasoccurred.Thusfor
thetopbranch,connectingthetwoBs,itis P ( B2| B1),whereB1denotestheevent“thefirstmarble
drawnisblack”andB2denotestheevent“thesecondmarbledrawnisblack.”Sinceafterdrawinga
blackmarbleoutthereare9marblesleft,ofwhich6areblack,thisprobabilityis6/9.
Thenumbertotherightofeachfinalnodeiscomputedasshown,usingtheprinciplethatifthe
formulaintheConditionalRuleforProbabilityismultipliedby P ( B),thentheresultis
P ( B ∩ A)= P ( B)⋅ P ( A| B)
a. Theevent“bothmarblesareblack”is B1 ∩ B2andcorrespondstothetoprightnodeinthetree,which
hasbeencircled.Thusasindicatedthere,itis0.47.
b. Theevent“exactlyonemarbleisblack”correspondstothetwonodesofthetreeenclosedbythe
rectangle.Theeventsthatcorrespondtothesetwonodesaremutuallyexclusive:blackfollowedbywhite
isincompatiblewithwhitefollowedbyblack.ThusinaccordancewiththeAdditiveRuleforProbabilitywemerelyaddthetwoprobabilitiesnexttothesenodes,sincewhatwouldbesubtractedfromthesumis
zero.Thustheprobabilityofdrawingexactlyoneblackmarbleintwotriesis0.23+0.23=0.46.
Theevent“atleastonemarbleisblack”correspondstothethreenodesofthetreeenclosedby
eitherthecircleortherectangle.Theeventsthatcorrespondtothesenodesaremutually
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 153/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
153
exclusive,soasinpart(b)wemerelyaddtheprobabilitiesnexttothesenodes.Thusthe
probabilityofdrawingatleastoneblackmarbleintwotriesis0.47+0.23+0.23=0.93.
Ofcourse,thisanswercouldhavebeenfoundmoreeasilyusingtheProbabilityLawfor
Complements,simplysubtractingtheprobabilityofthecomplementaryevent,“twowhite
marblesaredrawn,”from1toobtain1−0.07=0.93.
As this example shows, finding the probability for each branch is fairly straightforward, since we
compute it knowing everything that has happened in the sequence of steps so far. Two principles that
are true in general emerge from this example:
ProbabilitiesonTreeDiagrams
1. The probability of the event corresponding to any node on a tree is the product of the numbers on the
unique path of branches that leads to that node from the start.
2. If an event corresponds to several final nodes, then its probability is obtained by adding the numbers next
to those nodes.
K E Y T A K E A W A Y S
• Aconditionalprobabilityistheprobabilitythataneventhasoccurred,takingintoaccountadditional
informationabouttheresultoftheexperiment.
• Aconditionalprobabilitycanalwaysbecomputedusingtheformulainthedefinition.Sometimesitcanbe
computedbydiscardingpartofthesamplespace.
• Twoevents AandBareindependentiftheprobability P ( A ∩ B)oftheirintersection A∩Bisequaltothe
product P ( A)⋅ P ( B)oftheirindividualprobabilities.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 154/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
154
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 155/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
155
a. Theprobabilitythatthecarddrawnisred.
b. Theprobabilitythatthecardisred,giventhatitisnotgreen.
c. Theprobabilitythatthecardisred,giventhatitisneitherrednoryellow.
d. Theprobabilitythatthecardisred,giventhatitisnotafour.
10. Aspecialdeckof16cardshas4thatareblue,4yellow,4green,and4red.Thefourcardsofeachcolor
arenumberedfromonetofour.Asinglecardisdrawnatrandom.Findthefollowingprobabilities.
a. Theprobabilitythatthecarddrawnisatwoorafour.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 156/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
156
b. Theprobabilitythatthecardisatwoorafour,giventhatitisnotaone.
c. Theprobabilitythatthecardisatwoorafour,giventhatitiseitheratwoorathree.
d. Theprobabilitythatthecardisatwoorafour,giventhatitisredorgreen.
11. Arandomexperimentgaverisetothetwo-waycontingencytableshown.Useittocomputetheprobabilities
indicated.
R S
A 0.12 0.18
B 0.28 0.42
a. P ( A), P ( R), P ( A ∩ R).
b. Basedontheanswerto(a),determinewhetherornottheevents AandRareindependent.
c. Basedontheanswerto(b),determinewhetherornot P ( A| R)canbepredictedwithoutanycomputation.Ifso,
maketheprediction.Inanycase,compute P ( A| R)usingtheRuleforConditionalProbability.
12. Arandomexperimentgaverisetothetwo-waycontingencytableshown.Useittocomputethe
probabilitiesindicated.
R S
A 0.13 0.07
B 0.61 0.19
a. P ( A), P ( R), P ( A ∩ R).
b. Basedontheanswerto(a),determinewhetherornottheevents AandRareindependent.
c. Basedontheanswerto(b),determinewhetherornot P ( A| R)canbepredictedwithoutany
computation.Ifso,maketheprediction.Inanycase,compute P ( A| R)usingtheRuleforConditional
Probability.
13. Supposeforevents AandBinarandomexperiment P ( A)=0.70and P ( B)=0.30.Computetheindicated
probability,orexplainwhythereisnotenoughinformationtodoso.
a. P ( A ∩ B).
b. P ( A ∩ B),withtheextrainformationthat AandBareindependent.
c. P ( A ∩ B),withtheextrainformationthat AandBaremutuallyexclusive.
14. Supposeforevents AandBconnectedtosomerandomexperiment, P ( A)=0.50and P ( B)=0.50.Computethe
indicatedprobability,orexplainwhythereisnotenoughinformationtodoso.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 157/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
157
a. P ( A ∩ B).
b. P ( A ∩ B),withtheextrainformationthat AandBareindependent.
c. P ( A ∩ B),withtheextrainformationthat AandBaremutuallyexclusive.
15. Supposeforevents A,B,andC connectedtosomerandomexperiment, A,B,andC areindependent
and P ( A)=0.88, P ( B)=0.65,and P (C )=0.44.Computetheindicatedprobability,orexplainwhythereisnotenough
informationtodoso.
a. P ( A ∩ B ∩ C )
b. P ( Ac∩ B c∩ C c)
16. Supposeforevents A,B,andC connectedtosomerandomexperiment, A,B,andC areindependent
and P ( A)=0.95, P ( B)=0.73,and P (C )=0.62.Computetheindicatedprobability,orexplainwhythereisnotenough
informationtodoso.
a. P ( A ∩ B ∩ C )
b. P ( Ac ∩ Bc ∩ C c)
A P P L I C A T I O N S
17. Thesamplespacethatdescribesallthree-childfamiliesaccordingtothegendersofthechildrenwithrespect
tobirthorderis
S ={bbb,bbg ,bgb,bgg , gbb, gbg , ggb, ggg }
Intheexperimentofselectingathree-childfamilyatrandom,computeeachofthefollowingprobabilities,
assumingalloutcomesareequallylikely.
a. Theprobabilitythatthefamilyhasatleasttwoboys.
b. Theprobabilitythatthefamilyhasatleasttwoboys,giventhatnotallofthechildrenaregirls.
c. Theprobabilitythatatleastonechildisaboy.
d. Theprobabilitythatatleastonechildisaboy,giventhatthefirstbornisagirl.
18. Thefollowingtwo-waycontingencytablegivesthebreakdownofthepopulationinaparticularlocale
accordingtoageandnumberofvehicularmovingviolationsinthepastthreeyears:
Age
Violations
0 1 2+
Under 21 0.04 0.06 0.02
21–40 0.25 0.16 0.01
41–60 0.23 0.10 0.02
60+ 0.08 0.03 0.00
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 158/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
158
Apersonisselectedatrandom.Findthefollowingprobabilities.
a. Thepersonisunder21.
b. Thepersonhashadatleasttwoviolationsinthepastthreeyears.
c. Thepersonhashadatleasttwoviolationsinthepastthreeyears,giventhatheisunder21.
d. Thepersonisunder21,giventhathehashadatleasttwoviolationsinthepastthreeyears.
e. Determinewhethertheevents“thepersonisunder21”and“thepersonhashadatleasttwo
violationsinthepastthreeyears”areindependentornot.
19. Thefollowingtwo-waycontingencytablegivesthebreakdownofthepopulationinaparticularlocale
accordingtopartyaffiliation( A,B,C ,orNone)andopiniononabondissue:
Affiliation
Opinion
Favors Opposes Undecided
A 0.12 0.09 0.07
B 0.16 0.12 0.14
C 0.04 0.03 0.06
None 0.08 0.06 0.03
Apersonisselectedatrandom.Findeachofthefollowingprobabilities.
a.
Thepersonisinfavorofthebondissue.b. Thepersonisinfavorofthebondissue,giventhatheisaffiliatedwithparty A.
c. Thepersonisinfavorofthebondissue,giventhatheisaffiliatedwithpartyB.
20. Thefollowingtwo-waycontingencytablegivesthebreakdownofthepopulationofpatronsatagrocery
storeaccordingtothenumberofitemspurchasedandwhetherornotthepatronmadeanimpulse
purchaseatthecheckoutcounter:
Number of Items
Impulse Purchase
Made Not Made
Few 0.01 0.19
Many 0.04 0.76
Apatronisselectedatrandom.Findeachofthefollowingprobabilities.
a. Thepatronmadeanimpulsepurchase.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 159/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
159
b. Thepatronmadeanimpulsepurchase,giventhatthetotalnumberofitemspurchasedwas
many.
c. Determinewhetherornottheevents“fewpurchases”and“madeanimpulsepurchaseatthe
checkoutcounter”areindependent.
21. Thefollowingtwo-waycontingencytablegivesthebreakdownofthepopulationofadultsinaparticular
localeaccordingtoemploymenttypeandleveloflifeinsurance:
Employment Type
Level of Insurance
Low Medium High
Unskilled 0.07 0.19 0.00
Semi-skilled 0.04 0.28 0.08
Skilled 0.03 0.18 0.05
Professional 0.01 0.05 0.02
Anadultisselectedatrandom.Findeachofthefollowingprobabilities.
a. Thepersonhasahighleveloflifeinsurance.
b. Thepersonhasahighleveloflifeinsurance,giventhathedoesnothaveaprofessionalposition.
c. Thepersonhasahighleveloflifeinsurance,giventhathehasaprofessionalposition.
d. Determinewhetherornottheevents“hasahighleveloflifeinsurance”and“hasaprofessional
position”areindependent.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 160/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
160
24. Amanhastwolightsinhiswellhousetokeepthepipesfromfreezinginwinter.Hechecksthelights
daily.Eachlighthasprobability0.002ofburningoutbeforeitischeckedthenextday(independentlyof
theotherlight).
a. Ifthelightsarewiredinparallelonewillcontinuetoshineeveniftheotherburnsout.Inthis
situation,computetheprobabilitythatatleastonelightwillcontinuetoshineforthefull24
hours.Notethegreatlyincreasedreliabilityofthesystemoftwobulbsoverthatofasinglebulb.
b. Ifthelightsarewiredinseriesneitheronewillcontinuetoshineevenifonlyoneofthemburns
out.Inthissituation,computetheprobabilitythatatleastonelightwillcontinuetoshineforthe
full24hours.Notetheslightlydecreasedreliabilityofthesystemoftwobulbsoverthatofa
singlebulb.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 161/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
161
25. Anaccountanthasobservedthat5%ofallcopiesofaparticulartwo-partformhaveanerrorinPartI,and
2%haveanerrorinPartII.Iftheerrorsoccurindependently,findtheprobabilitythatarandomlyselected
formwillbeerror-free.
26. Aboxcontains20screwswhichareidenticalinsize,but12ofwhicharezinccoatedand8ofwhicharenot.
Twoscrewsareselectedatrandom,withoutreplacement.
a. Findtheprobabilitythatbotharezinccoated.
b. Findtheprobabilitythatatleastoneiszinccoated.
A D D I T I O N A L E X E R C I S E S
27. Events AandBaremutuallyexclusive.Find P ( A| B).
28. Thecitycouncilofaparticularcityiscomposedoffivemembersofparty A,fourmembersofpartyB,and
threeindependents.Twocouncilmembersarerandomlyselectedtoformaninvestigativecommittee.a. Findtheprobabilitythatbotharefromparty A.
b. Findtheprobabilitythatatleastoneisanindependent.
c. Findtheprobabilitythatthetwohavedifferentpartyaffiliations(thatis,notboth A,notbothB,
andnotbothindependent).
29. Abasketballplayermakes60%ofthefreethrowsthatheattempts,exceptthatifhehasjusttriedand
missedafreethrowthenhischancesofmakingasecondonegodowntoonly30%.Supposehehasjustbeen
awardedtwofreethrows.
a. Findtheprobabilitythathemakesboth.
b. Findtheprobabilitythathemakesatleastone.(Atreediagramcouldhelp.)
30. Aneconomistwishestoascertaintheproportion pofthepopulationofindividualtaxpayerswhohave
purposelysubmittedfraudulentinformationonanincometaxreturn.Totrulyguaranteeanonymityofthe
taxpayersinarandomsurvey,taxpayersquestionedaregiventhefollowinginstructions.
1. Flipacoin.
2. Ifthecoinlandsheads,answer“Yes”tothequestion“Haveyoueversubmitted
fraudulentinformationonataxreturn?”evenifyouhavenot.
3. Ifthecoinlandstails,giveatruthful“Yes”or“No”answertothequestion“Haveyou
eversubmittedfraudulentinformationonataxreturn?”
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 162/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
162
Thequestionerisnottoldhowthecoinlanded,sohedoesnotknowifa“Yes”answeristhetruthorisgiven
onlybecauseofthecointoss.
a. UsingtheProbabilityRuleforComplementsandtheindependenceofthecointossandthe
taxpayers’statusfillintheemptycellsinthetwo-waycontingencytableshown.Assumethatthe
coinisfair.Eachcellexceptthetwointhebottomrowwillcontaintheunknownproportion(or
probability) p.
Status
Coin
ProbabilityH T
Fraud p
No fraud
Probability 1
b. Theonlyinformationthattheeconomistseesaretheentriesinthefollowingtable:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 163/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
163
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 164/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
164
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 165/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
165
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 166/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
166
Chapter4
DiscreteRandomVariables
It is often the case that a number is naturally associated to the outcome of a random experiment: thenumber of boys in a three-child family, the number of defective light bulbs in a case of 100 bulbs, the
length of time until the next customer arrives at the drive-through window at a bank. Such a number
varies from trial to trial of the corresponding experiment, and does so in a way that cannot be
predicted with certainty; hence, it is called a random variable. In this chapter and the next we study
such variables.
4.1RandomVariables
L E A R N I N G O B J E C T I V E S
1. Tolearntheconceptofarandomvariable.
2. Tolearnthedistinctionbetweendiscreteandcontinuousrandomvariables.
Definition
A random variable is a numerical quantity that is generated by a random experiment.
We will denote random variables by capital letters, such as X or Z , and the actual values that they can
take by lowercase letters, such as x and z .
Table 4.1 "Four Random Variables" gives four examples of random variables. In the second example, the
three dots indicates that every counting number is a possible value for X . Although it is highly unlikely, for
example, that it would take 50 tosses of the coin to observe heads for the first time, nevertheless it is
conceivable, hence the number 50 is a possible value. The set of possible values is infinite, but is still at
least countable, in the sense that all possible values can be listed one after another. In the last two
examples, by way of contrast, the possible values cannot be individually listed, but take up a whole
interval of numbers. In the fourth example, since the light bulb could conceivably continue to shine
indefinitely, there is no natural greatest value for its lifetime, so we simply place the symbol ∞ for infinity
as the right endpoint of the interval of possible values.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 167/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
167
Table 4.1 Four Random Variables
Experiment Number X PossibleValuesof X
Rolltwofairdice
Sumofthenumberofdotsonthetop
faces
2,3,4,5,6,7,8,9,10,11,
12
Flipafaircoinrepeatedly
Numberoftossesuntilthecoinlands
heads 1,2,3,4,…
Measurethevoltageatanelectrical
outlet Voltagemeasured 118≤ x ≤122
Operatealightbulbuntilitburnsout Timeuntilthebulbburnsout 0≤ x <∞
Definition
A random variable is called discrete if it has either a finite or a countable number of possible values. A
random variable is called continuous if its possible values contain a whole interval of numbers.
The examples in the table are typical in that discrete random variables typically arise from a counting
process, whereas continuous random variables typically arise from a measurement.
K E Y T A K E A W A Y S • Arandomvariableisanumbergeneratedbyarandomexperiment.
• Arandomvariableiscalled discreteifitspossiblevaluesformafiniteorcountableset.
• Arandomvariableiscalled continuousifitspossiblevaluescontainawholeintervalofnumbers.
E X E R C I S E S
B A S I C
1. Classifyeachrandomvariableaseitherdiscreteorcontinuous.
a.
Thenumberofarrivalsatanemergencyroombetweenmidnightand6:00a.m.
b. Theweightofaboxofcereallabeled“18ounces.”
c. Thedurationofthenextoutgoingtelephonecallfromabusinessoffice.
d. Thenumberofkernelsofpopcornina1-poundcontainer.
e. Thenumberofapplicantsforajob.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 168/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
168
2. Classifyeachrandomvariableaseitherdiscreteorcontinuous.
a. Thetimebetweencustomersenteringacheckoutlaneataretailstore.
b. Theweightofrefuseonatruckarrivingatalandfill.
c. Thenumberofpassengersinapassengervehicleonahighwayatrushhour.
d. Thenumberofclericalerrorsonamedicalchart.
e. Thenumberofaccident-freedaysinonemonthatafactory.
3. Classifyeachrandomvariableaseitherdiscreteorcontinuous.
a. Thenumberofboysinarandomlyselectedthree-childfamily.
b. Thetemperatureofacupofcoffeeservedatarestaurant.
c. Thenumberofno-showsforevery100reservationsmadewithacommercialairline.
d. Thenumberofvehiclesownedbyarandomlyselectedhousehold.
e. TheaverageamountspentonelectricityeachJulybyarandomlyselectedhouseholdinacertain
state.
4. Classifyeachrandomvariableaseitherdiscreteorcontinuous.
a. Thenumberofpatronsarrivingatarestaurantbetween5:00p.m.and6:00p.m.
b. Thenumberofnewcasesofinfluenzainaparticularcountyinacomingmonth.
c. Theairpressureofatireonanautomobile.
d. Theamountofrainrecordedatanairportoneday.
e. Thenumberofstudentswhoactuallyregisterforclassesatauniversitynextsemester.
5. Identifythesetofpossiblevaluesforeachrandomvariable.(Makeareasonableestimatebasedon
experience,wherenecessary.)
a. Thenumberofheadsintwotossesofacoin.
b. Theaverageweightofnewbornbabiesborninaparticularcountyonemonth.
c. Theamountofliquidina12-ouncecanofsoftdrink.
d. ThenumberofgamesinthenextWorldSeries(bestofuptosevengames).
e. Thenumberofcoinsthatmatchwhenthreecoinsaretossedatonce.
6. Identifythesetofpossiblevaluesforeachrandomvariable.(Makeareasonableestimatebasedon
experience,wherenecessary.)
a. Thenumberofheartsinafive-cardhanddrawnfromadeckof52cardsthatcontains13heartsinall.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 169/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
169
b. Thenumberofpitchesmadebyastartingpitcherinamajorleaguebaseballgame.
c. Thenumberofbreakdownsofcitybusesinalargecityinoneweek.
d. Thedistancearentalcarrentedonadailyrateisdriveneachday.
e. Theamountofrainfallatanairportnextmonth.
A N S W E R S
1. a.discrete
a. continuous
b. continuous
c. discrete
d. discrete
3.
a. discrete
b. continuous
c. discrete
d. discrete
e. continuous
5.
a. {0.1.2}
b. aninterval(a,b)(answersvary)
c. aninterval(a,b)(answersvary)
d. {4,5,6,7}
e. {2,3}
4.2ProbabilityDistributionsforDiscreteRandomVariables
L E A R N I N G O B J E C T I V E S
1. Tolearntheconceptoftheprobabilitydistributionofadiscreterandomvariable.
2. Tolearntheconceptsofthemean,variance,andstandarddeviationofadiscreterandomvariable,and
howtocomputethem.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 170/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
170
ProbabilityDistributions Associated to each possible value x of a discrete random variable X is the probability P ( x) that X will take
the value x in one trial of the experiment.
Definition
The probability distribution of a discrete random variable X is a list of each possible value
of X together with the probability that X takes that value in one trial of the experiment.
The probabilities in the probability distribution of a random variable X must satisfy the following two
conditions:
1. Each probability P ( x) must be between 0 and 1: 0≤ P ( x)≤1.
2. The sum of all the probabilities is 1: Σ P ( x)=1.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 171/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
171
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 172/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
172
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 173/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
173
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 174/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
174
Figure4.2ProbabilityDistributionforTossingTwoFairDice
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 175/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
175
TheMeanandStandardDeviationofaDiscreteRandomVariable
DefinitionThe mean (also called the expected value ) of a discrete random variable X is the number
µ= E ( X )=Σ x P ( x )
The mean of a random variable may be interpreted as the average of the values assumed by the random
variable in repeated trials of the experiment.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 176/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
176
The concept of expected value is also basic to the insurance industry, as the following simplified
example illustrates.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 177/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
177
E X A M P L E 5
Alifeinsurancecompanywillsella$200,000one-yeartermlifeinsurancepolicytoanindividualina
particularriskgroupforapremiumof$195.Findtheexpectedvaluetothecompanyofasinglepolicyifa
personinthisriskgrouphasa99.97%chanceofsurvivingoneyear.
Solution:
Let X denotethenetgaintothecompanyfromthesaleofonesuchpolicy.Therearetwopossibilities:the
insuredpersonlivesthewholeyearortheinsuredpersondiesbeforetheyearisup.Applyingthe“income
minusoutgo”principle,intheformercasethevalueof X is195−0;inthelattercaseit
is195−200,000=−199,805.Sincetheprobabilityinthefirstcaseis0.9997andinthesecondcaseis1−0.9997=0.0003,
theprobabilitydistributionfor X is:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 178/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
178
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 179/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
179
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 180/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
180
Computeeachofthefollowingquantities.
a. a.
b. P (0).
c. P( X >0).
d. P( X ≥0).
e. P ( X ≤−2).
f. Themean μof X .
g. Thevarianceσ 2of X .
h. Thestandarddeviationσ of X .
Solution:
a. Sinceallprobabilitiesmustaddupto1,a=1−(0.2+0.5+0.1)=0.2.
b. Directlyfromthetable, P (0)=0.5.
c. Fromthetable, P ( X >0)= P (1)+ P (4)=0.2+0.1=0.3.
d. Fromthetable, P ( X ≥0)= P (0)+ P (1)+ P (4)=0.5+0.2+0.1=0.8.
e. Sincenoneofthenumberslistedaspossiblevaluesfor X islessthanorequalto−2,theevent X ≤−2
isimpossible,soP( X ≤−2)=0.
f. Usingtheformulainthedefinitionof μ,
µ=Σ x P ( x )=(−1)⋅0.2+0⋅0.5+1⋅0.2+4⋅0.1=0.4
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 181/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
181
K E Y T A K E A W A Y S
• Theprobabilitydistributionofadiscreterandomvariable X isalistingofeachpossiblevalue x taken
by X alongwiththeprobability P ( x)that X takesthatvalueinonetrialoftheexperiment.
• Themean μofadiscreterandomvariable X isanumberthatindicatestheaveragevalueof X over
numeroustrialsoftheexperiment.Itiscomputedusingtheformula µ=Σ x P ( x).
• Thevarianceσ 2andstandarddeviationσ ofadiscreterandomvariable X arenumbersthatindicatethe
variabilityof X overnumeroustrialsoftheexperiment.Theymaybecomputedusingthe
formulaσ 2
=[Σ x
2 P ( x )
]− µ
2
,takingthesquareroottoobtainσ .
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 182/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
182
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 183/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
183
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 184/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
184
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 185/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
185
10. Let X denotethenumberoftimesafaircoinlandsheadsinthreetosses.Constructtheprobability
distributionof X .
11. Fivethousandlotteryticketsaresoldfor$1each.Oneticketwillwin$1,000,twoticketswillwin$500each,
andtenticketswillwin$100each.Let X denotethenetgainfromthepurchaseofarandomlyselectedticket.
a. Constructtheprobabilitydistributionof X .
b. Computetheexpectedvalue E ( X )of X .Interpretitsmeaning.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 186/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
186
c. Computethestandarddeviationσ of X .
12. Seventhousandlotteryticketsaresoldfor$5each.Oneticketwillwin$2,000,twoticketswillwin$750each,
andfiveticketswillwin$100each.Let X denotethenetgainfromthepurchaseofarandomlyselected
ticket.
a. Constructtheprobabilitydistributionof X .
b. Computetheexpectedvalue E ( X )of X .Interpretitsmeaning.
c. Computethestandarddeviationσ of X .
13. Aninsurancecompanywillsella$90,000one-yeartermlifeinsurancepolicytoanindividualinaparticular
riskgroupforapremiumof$478.Findtheexpectedvaluetothecompanyofasinglepolicyifapersoninthis
riskgrouphasa99.62%chanceofsurvivingoneyear.
14. Aninsurancecompanywillsella$10,000one-yeartermlifeinsurancepolicytoanindividualinaparticular
riskgroupforapremiumof$368.Findtheexpectedvaluetothecompanyofasinglepolicyifapersoninthis
riskgrouphasa97.25%chanceofsurvivingoneyear.
15. Aninsurancecompanyestimatesthattheprobabilitythatanindividualinaparticularriskgroupwillsurvive
oneyearis0.9825.Suchapersonwishestobuya$150,000one-yeartermlifeinsurancepolicy.LetC denote
howmuchtheinsurancecompanychargessuchapersonforsuchapolicy.
a. Constructtheprobabilitydistributionof X .(TwoentriesinthetablewillcontainC .)
b. Computetheexpectedvalue E ( X )of X .c. DeterminethevalueC musthaveinorderforthecompanytobreakevenonallsuchpolicies
(thatis,toaverageanetgainofzeroperpolicyonsuchpolicies).
d. DeterminethevalueC musthaveinorderforthecompanytoaverageanetgainof$250per
policyonallsuchpolicies.
16. Aninsurancecompanyestimatesthattheprobabilitythatanindividualinaparticularriskgroupwillsurvive
oneyearis0.99.Suchapersonwishestobuya$75,000one-yeartermlifeinsurancepolicy.LetC denotehow
muchtheinsurancecompanychargessuchapersonforsuchapolicy.
a. Constructtheprobabilitydistributionof X .(TwoentriesinthetablewillcontainC .)
b. Computetheexpectedvalue E ( X )of X .
c. DeterminethevalueC musthaveinorderforthecompanytobreakevenonallsuchpolicies
(thatis,toaverageanetgainofzeroperpolicyonsuchpolicies).
d. DeterminethevalueC musthaveinorderforthecompanytoaverageanetgainof$150per
policyonallsuchpolicies.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 187/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
187
17. Aroulettewheelhas38slots.Thirty-sixslotsarenumberedfrom1to36;halfofthemareredandhalfare
black.Theremainingtwoslotsarenumbered0and00andaregreen.Ina$1betonred,thebettorpays$1to
play.Iftheballlandsinaredslot,hereceivesbackthedollarhebetplusanadditionaldollar.Iftheballdoes
notlandonredheloseshisdollar.Let X denotethenetgaintothebettorononeplayofthegame.
a. Constructtheprobabilitydistributionof X .
b. Computetheexpectedvalue E ( X )of X ,andinterpretitsmeaninginthecontextoftheproblem.
c. Computethestandarddeviationof X .
18. Aroulettewheelhas38slots.Thirty-sixslotsarenumberedfrom1to36;theremainingtwoslotsare
numbered0and00.Supposethe“number”00isconsiderednottobeeven,butthenumber0isstilleven.In
a$1betoneven,thebettorpays$1toplay.Iftheballlandsinanevennumberedslot,hereceivesbackthe
dollarhebetplusanadditionaldollar.Iftheballdoesnotlandonanevennumberedslot,heloseshisdollar.
Let X denotethenetgaintothebettorononeplayofthegame.
a. Constructtheprobabilitydistributionof X .
b. Computetheexpectedvalue E ( X )of X ,andexplainwhythisgameisnotofferedinacasino
(where0isnotconsideredeven).
c. Computethestandarddeviationof X .
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 188/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
188
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 189/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
189
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 190/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
190
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 191/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
191
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 192/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
192
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 193/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
193
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 194/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
194
4.3TheBinomialDistribution
L E A R N I N G O B J E C T I V E S
1. Tolearntheconceptofabinomialrandomvariable.
2. Tolearnhowtorecognizearandomvariableasbeingabinomialrandomvariable.
The experiment of tossing a fair coin three times and the experiment of observing the genders
according to birth order of the children in a randomly selected three-child family are completely
different, but the random variables that count the number of heads in the coin toss and the number
of boys in the family (assuming the two genders are equally likely) are the same random variable, the
one with probability distribution
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 195/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
195
A histogram that graphically illustrates this probability distribution is given inFigure 4.4 "Probability
Distribution for Three Coins and Three Children". What is common to the two experiments is that we
perform three identical and independent trials of the same action, each trial has only two outcomes
(heads or tails, boy or girl), and the probability of success is the same number, 0.5, on every trial. Therandom variable that is generated is called the binomial random variable with parameters n =
3 and p = 0.5. This is just one case of a general situation.
Figure 4.4 Probability Distribution for Three Coins and Three Children
Definition
Suppose a random experiment has the following characteristics.
1. There are n identical and independent trials of a common procedure.
2. There are exactly two possible outcomes for each trial, one termed “success” and the other “failure.”
3. The probability of success on any one trial is the same number p.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 196/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
196
Then the discrete random variable X that counts the number of successes in the n trials is the binomial
random variable with parameters nand p. We also say that X has a binomial distribution with
parameters n and p.
The following four examples illustrate the definition. Note how in every case “success” is the outcomethat is counted, not the outcome that we prefer or think is better in some sense.
1. A random sample of 125 students is selected from a large college in which the proportion of students who
are females is 57%. Suppose X denotes the number of female students in the sample. In this situation
there are n = 125 identical and independent trials of a common procedure, selecting a student at random;
there are exactly two possible outcomes for each trial, “success” (what we are counting, that the student be
female) and “failure;” and finally the probability of success on any one trial is the same number p=
0.57. X is a binomial random variable with parameters n = 125 and p = 0.57.
2. A multiple-choice test has 15 questions, each of which has five choices. An unprepared student taking the
test answers each of the questions completely randomly by choosing an arbitrary answer from the five
provided. Suppose X denotes the number of answers that the student gets right. X is a binomial random
variable with parameters n = 15 and p=1/5=0.20.
3. In a survey of 1,000 registered voters each voter is asked if he intends to vote for a candidate Titania
Queen in the upcoming election. Suppose X denotes the number of voters in the survey who intend to vote
for Titania Queen. X is a binomial random variable with n = 1000 and p equal to the true proportion of
voters (surveyed or not) who intend to vote for Titania Queen.
4. An experimental medication was given to 30 patients with a certain medical condition. Suppose X denotes
the number of patients who develop severe side effects. X is a binomial random variable with n = 30
and p equal to the true probability that a patient with the underlying condition will experience severe side
effects if given that medication.
ProbabilityFormulaforaBinomialRandomVariable
Often the most difficult aspect of working a problem that involves the binomial random variable is
recognizing that the random variable in question has a binomial distribution. Once that is known,
probabilities can be computed using the following formula.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 197/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 198/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
198
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 199/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
199
Figure4.5ProbabilityDistributionoftheBinomialRandomVariableinNote4.29"Example7"
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 200/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
200
SpecialFormulasfortheMeanandStandardDeviationofaBinomialRandom
VariableSince a binomial random variable is a discrete random variable, the formulas for its mean, variance,
and standard deviation given in the previous section apply to it, as we just saw in Note 4.29
"Example 7" in the case of the mean. However, for the binomial random variable there are much
simpler formulas.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 201/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
201
TheCumulativeProbabilityDistributionofaBinomialRandomVariable
In order to allow a broader range of more realistic problems Chapter 12 "Appendix" contains
probability tables for binomial random variables for various choices of the parameters n and p. These
tables are not the probability distributions that we have seen so far, but are cumulative probability
distributions. In the place of the probability P ( x ) the table contains the probability
P ( X ≤ x )= P (0)+ P (1)+ ⋅ ⋅ ⋅ + P ( x )This is illustrated in Figure 4.6 "Cumulative Probabilities". The probability entered in the table
corresponds to the area of the shaded region. The reason for providing a cumulative table is that in
practical problems that involve a binomial random variable typically the probability that is sought is
of the form P ( X ≤ x ) or P ( X ≥ x ). The cumulative table is much easier to use for computing P ( X ≤ x ) since all
the individual probabilities have already been computed and added. The one table suffices for
both P ( X ≤ x ) or P ( X ≥ x ) and can be used to readily obtain probabilities of the form P ( x ), too, because of
the following formulas. The first is just the Probability Rule for Complements.
Figure 4.6 Cumulative Probabilities
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 202/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
202
If X is a discrete random variable, then
P ( X ≥ x )=1− P ( X ≤ x −1) and P ( x )= P ( X ≤ x )− P ( X ≤ x −1)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 203/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
203
b. Thestudentmustguesscorrectlyonatleast60%ofthequestions,which
is0.60⋅10=6questions.Theprobabilitysoughtisnot P (6)(aneasymistaketomake),but
P ( X ≥6)= P (6)+ P (7)+ P (8)+ P (9)+ P (10)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 204/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
204
Insteadofcomputingeachofthesefivenumbersusingtheformulaandaddingthemwecanusethetable
toobtain
P ( X ≥6)=1− P ( X ≤5)=1−0.6230=0.3770
whichismuchlessworkandofsufficientaccuracyforthesituationathand.E X A M P L E 1 0
Anappliancerepairmanservicesfivewashingmachinesonsiteeachday.One-thirdoftheservice
callsrequireinstallationofaparticularpart.
a. Therepairmanhasonlyonesuchpartonhistrucktoday.Findtheprobabilitythattheonepartwill
beenoughtoday,thatis,thatatmostonewashingmachineheserviceswillrequireinstallationofthis
particularpart.
b. Findtheminimumnumberofsuchpartsheshouldtakewithhimeachdayinorderthattheprobability
thathehaveenoughfortheday'sservicecallsisatleast95%.
Solution:
Let X denotethenumberofservicecallstodayonwhichthepartisrequired.Then X isabinomial
randomvariablewithparametersn=5and p=1/3=0.3^−.
a. Notethattheprobabilityinquestionisnot P (1),butratherP( X ≤1).Usingthecumulative
distributiontableinChapter12"Appendix",
P ( X ≤1)=0.4609
b. Theansweristhesmallestnumber x suchthatthetableentry P ( X ≤ x )isatleast0.9500.
Since P ( X ≤2)=0.7901islessthan0.95,twopartsarenotenough.Since P ( X ≤3)=0.9547isaslargeas0.95,
threepartswillsufficeatleast95%ofthetime.Thustheminimumneededisthree.
K E Y T A K E A W A Y S
• Thediscreterandomvariable X thatcountsthenumberofsuccessesinnidentical,independenttrialsofa
procedurethatalwaysresultsineitheroftwooutcomes,“success”or“failure,”andinwhichtheprobabilityofsuccessoneachtrialisthesamenumber p,iscalledthebinomialrandomvariablewith
parametersnand p.
• Thereisaformulafortheprobabilitythatthebinomialrandomvariablewithparametersnand pwilltake
aparticularvalue x .
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 205/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
205
• Therearespecialformulasforthemean,variance,andstandarddeviationofthebinomialrandomvariable
withparametersnand pthataremuchsimplerthanthegeneralformulasthatapplytoalldiscrete
randomvariables.
• Cumulativeprobabilitydistributiontables,whenavailable,facilitatecomputationofprobabilities
encounteredintypicalpracticalsituations.
B A S I C
1. Determinewhetherornottherandomvariable X isabinomialrandomvariable.Ifso,givethevalues
ofnand p.Ifnot,explainwhynot.
a. X isthenumberofdotsonthetopfaceoffairdiethatisrolled.
b. X isthenumberofheartsinafive-cardhanddrawn(withoutreplacement)fromawell-shuffled
ordinarydeck.
c. X isthenumberofdefectivepartsinasampleoftenrandomlyselectedpartscomingfroma
manufacturingprocessinwhich0.02%ofallpartsaredefective.
d. X isthenumberoftimesthenumberofdotsonthetopfaceofafairdieiseveninsixrollsofthe
die.
e. X isthenumberofdicethatshowanevennumberofdotsonthetopfacewhensixdiceare
rolledatonce.
2. Determinewhetherornottherandomvariable X isabinomialrandomvariable.Ifso,givethevalues
ofnand p.Ifnot,explainwhynot.
a. X isthenumberofblackmarblesinasampleof5marblesdrawnrandomlyandwithout
replacementfromaboxthatcontains25whitemarblesand15blackmarbles.
b. X isthenumberofblackmarblesinasampleof5marblesdrawnrandomlyandwithreplacement
fromaboxthatcontains25whitemarblesand15blackmarbles.
c. X isthenumberofvotersinfavorofproposedlawinasample1,200randomlyselectedvoters
drawnfromtheentireelectorateofacountryinwhich35%ofthevotersfavorthelaw.
d. X isthenumberoffishofaparticularspecies,amongthenexttenlandedbyacommercial
fishingboat,thataremorethan13inchesinlength,when17%ofallsuchfishexceed13inches
inlength.
e. X isthenumberofcoinsthatmatchatleastoneothercoinwhenfourcoinsaretossedatonce.
3. X isabinomialrandomvariablewithparametersn=12and p=0.82.Computetheprobabilityindicated.
a. P (11)
b. P (9)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 206/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
206
c. P (0)
d. P (13)
4. X isabinomialrandomvariablewithparametersn=16and p=0.74.Computetheprobabilityindicated.
a. P (14)
b. P (4)
c. P (0)
d. P (20)
5. X isabinomialrandomvariablewithparametersn=5, p=0.5.UsethetablesinChapter12"Appendix"to
computetheprobabilityindicated.
a. P( X ≤3)
b. P( X ≥3)
c. P (3)
d. P (0)
e. P (5)
6. X isabinomialrandomvariablewithparametersn=5, p=0.3^−.UsethetableinChapter12"Appendix"to
computetheprobabilityindicated.
a. P( X ≤2)
b. P( X ≥2)
c. P (2)
d. P (0)
e. P (5)
7. X isabinomialrandomvariablewiththeparametersshown.UsethetablesinChapter12"Appendix"to
computetheprobabilityindicated.
a. n=10, p=0.25,P( X ≤6)
b. n=10, p=0.75,P( X ≤6)
c. n=15, p=0.75,P( X ≤6)
d. n=15, p=0.75, P (12)
e. n=15, p=0.6−, P (10≤ X ≤12)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 207/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
207
8. X isabinomialrandomvariablewiththeparametersshown.UsethetablesinChapter12"Appendix"to
computetheprobabilityindicated.
a. n=5, p=0.05,P( X ≤1)
b. n=5, p=0.5,P( X ≤1)
c. n=10, p=0.75,P( X ≤5)
d. n=10, p=0.75, P (12)
e. n=10, p=0.6−, P (5≤ X ≤8)
9. X isabinomialrandomvariablewiththeparametersshown.Usethespecialformulastocomputeits
mean μandstandarddeviationσ .
a. n=8, p=0.43
b. n=47, p=0.82
c. n=1200, p=0.44
d. n=2100, p=0.62
10. X isabinomialrandomvariablewiththeparametersshown.Usethespecialformulastocomputeits
mean μandstandarddeviationσ .
a. n=14, p=0.55
b. n=83, p=0.05
c. n=957, p=0.35
d. n=1750, p=0.79
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 208/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
208
16. Acoinisbentsothattheprobabilitythatitlandsheadsupis2/3.Thecoinistossedtentimes.
a. Findtheprobabilitythatitlandsheadsupatmostfivetimes.
b. Findtheprobabilitythatitlandsheadsupmoretimesthanitlandstailsup.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 209/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
209
A P P L I C A T I O N S
17. AnEnglish-speakingtouristvisitsacountryinwhich30%ofthepopulationspeaksEnglish.Heneedstoask
someonedirections.
a. FindtheprobabilitythatthefirstpersonheencounterswillbeabletospeakEnglish.
b. Thetouristseesfourlocalpeoplestandingatabusstop.Findtheprobabilitythatatleastoneof
themwillbeabletospeakEnglish.
18. Theprobabilitythatanegginaretailpackageiscrackedorbrokenis0.025.
a. Findtheprobabilitythatacartonofonedozeneggscontainsnoeggsthatareeithercrackedor
broken.
b. Findtheprobabilitythatacartonofonedozeneggshas(i)atleastonethatiseithercrackedor
broken;(ii)atleasttwothatarecrackedorbroken.
c.
Findtheaveragenumberofcrackedorbrokeneggsinonedozencartons.19. Anappliancestoresells20refrigeratorseachweek.Tenpercentofallpurchasersofarefrigeratorbuyan
extendedwarranty.Let X denotethenumberofthenext20purchaserswhodoso.
a. Verifythat X satisfiestheconditionsforabinomialrandomvariable,andfindnand p.
b. Findtheprobabilitythat X iszero.
c. Findtheprobabilitythat X istwo,three,orfour.
d. Findtheprobabilitythat X isatleastfive.
20. Adversegrowingconditionshavecaused5%ofgrapefruitgrowninacertainregiontobeofinferiorquality.
Grapefruitaresoldbythedozen.
a. Findtheaveragenumberofinferiorqualitygrapefruitperboxofadozen.
b. Aboxthatcontainstwoormoregrapefruitofinferiorqualitywillcauseastrongadverse
customerreaction.Findtheprobabilitythataboxofonedozengrapefruitwillcontaintwoor
moregrapefruitofinferiorquality.
21. Theprobabilitythata7-ounceskeinofadiscountworstedweightknittingyarncontainsaknotis0.25.
Gonerilbuystenskeinstocrochetanafghan.
a. Findtheprobabilitythat(i)noneofthetenskeinswillcontainaknot;(ii)atmostonewill.
b. Findtheexpectednumberofskeinsthatcontainknots.
c. Findthemostlikelynumberofskeinsthatcontainknots.
22. One-thirdofallpatientswhoundergoanon-invasivebutunpleasantmedicaltestrequireasedative.A
laboratoryperforms20suchtestsdaily.Let X denotethenumberofpatientsonanygivendaywhorequirea
sedative.
a. Verifythat X satisfiestheconditionsforabinomialrandomvariable,andfindnand p.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 210/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 211/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
211
a. Findtheprobabilitythattheproofreaderwillmissatleastoneofthem.
b. Showthattwosuchproofreadersworkingindependentlyhavea99.96%chanceofdetectingan
errorinapieceofwrittenwork.
c. Findtheprobabilitythattwosuchproofreadersworkingindependentlywillmissatleastone
errorinaworkthatcontainsfourerrors.
30. Amultiplechoiceexamhas20questions;therearefourchoicesforeachquestion.
a. Astudentguessestheanswertoeveryquestion.Findthechancethatheguessescorrectly
betweenfourandseventimes.
b. Findtheminimumscoretheinstructorcansetsothattheprobabilitythatastudentwillpassjust
byguessingis20%orless.
31. Inspiteoftherequirementthatalldogsboardedinakennelbeinoculated,thechancethatahealthydog
boardedinaclean,well-ventilatedkennelwilldevelopkennelcoughfromacarrieris0.008.
a. Ifacarrier(notknowntobesuch,ofcourse)isboardedwiththreeotherdogs,whatisthe
probabilitythatatleastoneofthethreehealthydogswilldevelopkennelcough?
b. Ifacarrierisboardedwithfourotherdogs,whatistheprobabilitythatatleastoneofthefour
healthydogswilldevelopkennelcough?
c. Thepatternevidentfromparts(a)and(b)isthatif K +1dogsareboardedtogether,oneacarrier
andK healthydogs,thentheprobabilitythatatleastoneofthehealthydogswilldevelopkennel
coughis P ( X ≥1)=1−(0.992) K ,where X isthebinomialrandomvariablethatcountsthenumberof
healthydogsthatdevelopthecondition.ExperimentwithdifferentvaluesofK inthisformulato
findthemaximumnumber K +1ofdogsthatakennelownercanboardtogethersothatifoneof
thedogshasthecondition,thechancethatanotherdogwillbeinfectedislessthan0.05.
32. Investigatorsneedtodeterminewhichof600adultshaveamedicalconditionthataffects2%oftheadult
population.Abloodsampleistakenfromeachoftheindividuals.
a. Showthattheexpectednumberofdiseasedindividualsinthegroupof600is12individuals.
b. Insteadoftestingall600bloodsamplestofindtheexpected12diseasedindividuals,
investigatorsgroupthesamplesinto60groupsof10each,mixalittleofthebloodfromeachof
the10samplesineachgroup,andtesteachofthe60mixtures.Showthattheprobabilitythat
anysuchmixturewillcontainthebloodofatleastonediseasedperson,hencetestpositive,is
about0.18.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 212/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
212
c. Basedontheresultin(b),showthattheexpectednumberofmixturesthattestpositiveisabout
11.(Supposingthatindeed11ofthe60mixturestestpositive,thenweknowthatnoneofthe
490personswhosebloodwasintheremaining49samplesthattestednegativehasthedisease.
Wehaveeliminated490personsfromoursearchwhileperformingonly60tests.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 213/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
213
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 214/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
214
Chapter5
ContinuousRandomVariables
As discussed in Section 4.1 "Random Variables" in Chapter 4 "Discrete Random Variables", a random
variable is called continuous if its set of possible values contains a whole interval of decimal numbers.
In this chapter we investigate such random variables.
5.1ContinuousRandomVariables
L E A R N I N G O B J E C T I V E S
1. Tolearntheconceptoftheprobabilitydistributionofacontinuousrandomvariable,andhowitisusedto
computeprobabilities.
2. Tolearnbasicfactsaboutthefamilyofnormallydistributedrandomvariables.
TheProbabilityDistributionofaContinuousRandomVariable
For a discrete random variable X the probability that X assumes one of its possible values on a single trial
of the experiment makes good sense. This is not the case for a continuous random variable. For example,
suppose X denotes the length of time a commuter just arriving at a bus stop has to wait for the next bus. If
buses run every 30 minutes without fail, then the set of possible values of X is the interval denoted [0,30], the set of all decimal numbers between 0 and 30. But although the number 7.211916 is a possible value
of X , there is little or no meaning to the concept of the probability that the commuter will wait precisely
7.211916 minutes for the next bus. If anything the probability should be zero, since if we could
meaningfully measure the waiting time to the nearest millionth of a minute it is practically inconceivable
that we would ever get exactly 7.211916 minutes. More meaningful questions are those of the form: What
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 215/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
215
is the probability that the commuter's waiting time is less than 10 minutes, or is between 5 and 10
minutes? In other words, with continuous random variables one is concerned not with the event that the
variable assumes a single particular value, but with the event that the random variable assumes a value in
a particular interval.
DefinitionThe probability distribution of a continuous random variable X is an assignment of
probabilities to intervals of decimal numbers using a function f ( x), called a density function, in the
following way: the probability that X assumes a value in the interval [a,b] is equal to the area of the
region that is bounded above by the graph of the equation y= f ( x),bounded below by the x-axis, and
bounded on the left and right by the vertical lines through a and b, as illustrated in Figure 5.1
"Probability Given as Area of a Region under a Curve" .
Figure 5.1 Probability Given as Area of a Region under a Curve
This definition can be understood as a natural outgrowth of the discussion inSection 2.1.3 "Relative
Frequency Histograms" in Chapter 2 "Descriptive Statistics". There we saw that if we have in view a
population (or a very large sample) and make measurements with greater and greater precision, then
as the bars in the relative frequency histogram become exceedingly fine their vertical sides merge
and disappear, and what is left is just the curve formed by their tops, as shown in Figure 2.5 "Sample
Size and Relative Frequency Histograms" in Chapter 2 "Descriptive Statistics". Moreover the total
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 216/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
216
area under the curve is 1, and the proportion of the population with measurements between two
numbersa and b is the area under the curve and between a and b, as shown in Figure 2.6 "A Very
Fine Relative Frequency Histogram" in Chapter 2 "Descriptive Statistics". If we think of X as a
measurement to infinite precision arising from the selection of any one member of the population at
random, then P (a< X <b)is simply the proportion of the population with measurements between a and b, the curve in the relative frequency histogram is the density function for X , and we
arrive at the definition just above.
Every density function f ( x) must satisfy the following two conditions:
1. For all numbers x , f ( x )≥0, so that the graph of y= f ( x ) never drops below the x -axis.
2. The area of the region under the graph of y= f ( x ) and above the x -axis is 1.
Because the area of a line segment is 0, the definition of the probability distribution of a continuous
random variable implies that for any particular decimal number, say a, the probability
that X assumes the exact value a is 0. This property implies that whether or not the endpoints of an
interval are included makes no difference concerning the probability of the interval.
For any continuous random variable X :
P (a≤ X ≤b)= P (a< X ≤b)= P (a≤ X <b)= P (a< X <b)
E X A M P L E 1
Arandomvariable X hastheuniformdistributionontheinterval [0,1]:thedensityfunction
is f ( x )=1if x isbetween0and1and f ( x )=0forallothervaluesof x ,asshowninFigure5.2"Uniform
Distributionon".
Figure5.2UniformDistributionon[0,1]
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 217/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
217
a. FindP( X >0.75),theprobabilitythat X assumesavaluegreaterthan0.75.
b. FindP( X ≤0.2),theprobabilitythat X assumesavaluelessthanorequalto0.2.
c. FindP(0.4< X <0.7),theprobabilitythat X assumesavaluebetween0.4and0.7.
Solution:
a. P( X >0.75)istheareaoftherectangleofheight1andbaselength1−0.75=0.25,hence
is base×height=(0.25)⋅(1)=0.25.SeeFigure5.3"ProbabilitiesfromtheUniformDistributionon"(a).
b. P( X ≤0.2)istheareaoftherectangleofheight1andbaselength0.2−0=0.2,hence
is base×height=(0.2)⋅(1)=0.2.SeeFigure5.3"ProbabilitiesfromtheUniformDistributionon"(b).
c. P(0.4< X <0.7)istheareaoftherectangleofheight1andlength0.7−0.4=0.3,hence
is base×height=(0.3)⋅(1)=0.3.SeeFigure5.3"ProbabilitiesfromtheUniformDistributionon"(c).
Figure5.3ProbabilitiesfromtheUniformDistributionon[0,1]
E X A M P L E 2
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 218/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
218
Amanarrivesatabusstopatarandomtime(thatis,withnoregardforthescheduledservice)to
catchthenextbus.Busesrunevery30minuteswithoutfail,hencethenextbuswillcomeanytime
duringthenext30minuteswithevenlydistributedprobability(auniformdistribution).Findthe
probabilitythatabuswillcomewithinthenext10minutes.
Solution:
Thegraphofthedensityfunctionisahorizontallineabovetheintervalfrom0to30andisthe x -axis
everywhereelse.Sincethetotalareaunderthecurvemustbe1,theheightofthehorizontallineis
1/30.SeeFigure5.4"ProbabilityofWaitingAtMost10MinutesforaBus" .Theprobabilitysought
is P (0≤ X ≤10).Bydefinition,thisprobabilityistheareaoftherectangularregionboundedabovebythe
horizontalline f ( x )=1/30,boundedbelowbythe x -axis,boundedontheleftbytheverticallineat0
(they -axis),andboundedontherightbytheverticallineat10.Thisistheshadedregionin Figure5.4
"ProbabilityofWaitingAtMost10MinutesforaBus" .Itsareaisthebaseoftherectangletimesits
height,10⋅(1/30)=1/3.Thus P (0≤ X ≤10)=1/3.
Figure5.4ProbabilityofWaitingAtMost10MinutesforaBus
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 219/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
219
Figure 5.5 Bell Curves with = 0.25 and Different Values of
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 220/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
220
The value of determines whether the bell curve is tall and thin or short and squat, subject always
to the condition that the total area under the curve be equal to 1. This is shown in Figure 5.6 "Bell
Curves with ", where we have arbitrarily chosen to center the curves at = 6.
Figure 5.6 Bell Curves with = 6 and Different Values of
Definition
The probability distribution corresponding to the density function for the bell curve with
parameters and is called the normal distribution with mean and standard deviation .
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 221/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
221
Definition
A continuous random variable whose probabilities are described by the normal distribution with
mean and standard deviation is called a normally distributed random variable, or anormal random variable for short, with mean and standard deviation .
Figure 5.7 "Density Function for a Normally Distributed Random Variable with Mean " shows the
density function that determines the normal distribution with mean and standard deviation .
We repeat an important fact about this curve:
The density curve for the normal distribution is symmetric about the mean.
Figure 5.7 Density Function for a Normally Distributed Random Variable with Mean and Standard
Deviation
E X A M P L E 3
Heightsof25-year-oldmeninacertainregionhavemean69.75inchesandstandarddeviation2.59
inches.Theseheightsareapproximatelynormallydistributed.Thustheheight X ofarandomly
selected25-year-oldmanisanormalrandomvariablewithmean μ=69.75andstandard
deviationσ =2.59.Sketchaqualitativelyaccurategraphofthedensityfunctionfor X .Findthe
probabilitythatarandomlyselected25-year-oldmanismorethan69.75inchestall.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 222/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
222
Solution:
Thedistributionofheightslookslikethebellcurvein Figure5.8"DensityFunctionforHeightsof25-
Year-OldMen".Theimportantpointisthatitiscenteredatitsmean,69.75,andissymmetricabout
themean.
Figure5.8DensityFunctionforHeightsof25-Year-OldMen
Sincethetotalareaunderthecurveis1,bysymmetrytheareatotherightof69.75ishalfthetotal,
or0.5.ButthisareaispreciselytheprobabilityP( X >69.75),theprobabilitythatarandomlyselected
25-year-oldmanismorethan69.75inchestall.
Wewilllearnhowtocomputeotherprobabilitiesinthenexttwosections.
K E Y T A K E A W A Y S
• Foracontinuousrandomvariable X theonlyprobabilitiesthatarecomputedarethoseof X takingavalue
inaspecifiedinterval.
• Theprobabilitythat X takeavalueinaparticularintervalisthesamewhetherornottheendpointsofthe
intervalareincluded.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 223/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
223
• Theprobability P (a< X <b),that X takeavalueintheintervalfromatob,istheareaoftheregionbetween
theverticallinesthroughaandb,abovethe x -axis,andbelowthegraphofafunction f ( x )calledthe
densityfunction.
• Anormallydistributedrandomvariableisonewhosedensityfunctionisabellcurve.
• Everybellcurveissymmetricaboutitsmeanandlieseverywhereabovethe x -axis,whichitapproaches
asymptotically(arbitrarilycloselywithouttouching).
E X E R C I S E S
B A S I C
1. Acontinuousrandomvariable X hasauniformdistributionontheinterval[5,12].Sketchthegraphofitsdensity
function.
2. Acontinuousrandomvariable X hasauniformdistributionontheinterval[−3,3].Sketchthegraphofitsdensity
function.
3. Acontinuousrandomvariable X hasanormaldistributionwithmean100andstandarddeviation10.Sketcha
qualitativelyaccurategraphofitsdensityfunction.
4. Acontinuousrandomvariable X hasanormaldistributionwithmean73andstandarddeviation2.5.Sketcha
qualitativelyaccurategraphofitsdensityfunction.
5. Acontinuousrandomvariable X hasanormaldistributionwithmean73.Theprobabilitythat X takesavalue
greaterthan80is0.212.Usethisinformationandthesymmetryofthedensityfunctiontofindthe
probabilitythat X takesavaluelessthan66.Sketchthedensitycurvewithrelevantregionsshadedto
illustratethecomputation.
6.
Acontinuousrandomvariable X hasanormaldistributionwithmean169.Theprobabilitythat X takesavaluegreaterthan180is0.17.Usethisinformationandthesymmetryofthedensityfunctiontofindthe
probabilitythat X takesavaluelessthan158.Sketchthedensitycurvewithrelevantregionsshadedto
illustratethecomputation.
7. Acontinuousrandomvariable X hasanormaldistributionwithmean50.5.Theprobabilitythat X takesa
valuelessthan54is0.76.Usethisinformationandthesymmetryofthedensityfunctiontofindthe
probabilitythat X takesavaluegreaterthan47.Sketchthedensitycurvewithrelevantregionsshadedto
illustratethecomputation.
8. Acontinuousrandomvariable X hasanormaldistributionwithmean12.25.Theprobabilitythat X takesa
valuelessthan13is0.82.Usethisinformationandthesymmetryofthedensityfunctiontofindthe
probabilitythat X takesavaluegreaterthan11.50.Sketchthedensitycurvewithrelevantregionsshaded
toillustratethecomputation.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 224/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
224
9. Thefigureprovidedshowsthedensitycurvesofthreenormallydistributedrandomvariables X A, X B,
and X C .Theirstandarddeviations(innoparticularorder)are15,7,and20.Usethefiguretoidentifythe
valuesofthemeans µ A, µ B,and µC andstandarddeviationsσ A,σ B,andσ C ofthethreerandomvariables.
10. Thefigureprovidedshowsthedensitycurvesofthreenormallydistributedrandomvariables X A, X B,
and X C .Theirstandarddeviations(innoparticularorder)are20,5,and10.Usethefiguretoidentifythe
valuesofthemeans µ A, µ B,and µC andstandarddeviationsσ A,σ B,andσ C ofthethreerandomvariables.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 225/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
225
A P P L I C A T I O N S
11. Dogberry'salarmclockisbatteryoperated.Thebatterycouldfailwithequalprobabilityatanytimeofthe
dayornight.EverydayDogberrysetshisalarmfor6:30a.m.andgoestobedat10:00p.m.Findthe
probabilitythatwhentheclockbatteryfinallydies,itwilldosoatthemostinconvenienttime,between
10:00p.m.and6:30a.m.
12. BusesrunningabuslinenearDesdemona'shouserunevery15minutes.Withoutpayingattentiontothe
scheduleshewalkstotheneareststoptotakethebustotown.Findtheprobabilitythatshewaitsmorethan
10minutes.
13. Theamount X oforangejuiceinarandomlyselectedhalf-galloncontainervariesaccordingtoanormal
distributionwithmean64ouncesandstandarddeviation0.25ounce.
a. Sketchthegraphofthedensityfunctionfor X .
b. Whatproportionofallcontainerscontainlessthanahalfgallon(64ounces)?Explain.
c. Whatisthemedianamountoforangejuiceinsuchcontainers?Explain.
14. Theweight X ofgrassseedinbagsmarked50lbvariesaccordingtoanormaldistributionwithmean50lb
andstandarddeviation1ounce(0.0625lb).
a. Sketchthegraphofthedensityfunctionfor X .
b. Whatproportionofallbagsweighlessthan50pounds?Explain.
c. Whatisthemedianweightofsuchbags?Explain.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 226/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
226
5.2TheStandardNormalDistribution
L E A R N I N G O B J E C T I V E S
1. Tolearnwhatastandardnormalrandomvariableis.
2. TolearnhowtouseFigure12.2"CumulativeNormalProbability"tocomputeprobabilitiesrelatedtoa
standardnormalrandomvariable.
Definition
A standard normal random variable is a normally distributed random variable with mean =
0 and standard deviation = 1. It will always be denoted by the letter Z .
The density function for a standard normal random variable is shown in Figure 5.9 "Density Curve
for a Standard Normal Random Variable".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 227/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 228/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
228
b. Theminussignin−0.25makesnodifferenceintheprocedure;thetableisusedinexactlythesame
wayasinpart(a):theprobabilitysoughtisthenumberthatisintheintersectionoftherowwithheading−0.2
andthecolumnwithheading0.05,thenumber0.4013.ThusP( Z <−0.25)=0.4013.
E X A M P L E 5
Findtheprobabilitiesindicated.
a. P( Z >1.60).
b. P( Z >−1.02).
Solution:
a. Becausetheevents Z >1.60and Z ≤1.60arecomplements,theProbabilityRulefor
Complementsimpliesthat
P ( Z >1.60)=1− P ( Z ≤1.60)
Sinceinclusionoftheendpointmakesnodifferenceforthecontinuousrandom
variable Z , P ( Z ≤1.60)= P ( Z <1.60),whichweknowhowtofindfromthetable.Thenumberintherow
withheading1.6andinthecolumnwithheading0.00is0.9452.Thus P ( Z <1.60)=0.9452so
P ( Z >1.60)=1− P ( Z ≤1.60)=1−0.9452=0.0548
Figure5.11"ComputingaProbabilityforaRightHalf-Line"illustratestheideasgeometrically.Since
thetotalareaunderthecurveis1andtheareaoftheregiontotheleftof1.60is(fromthetable)
0.9452,theareaoftheregiontotherightof1.60mustbe1−0.9452=0.0548.
Figure5.11ComputingaProbabilityforaRightHalf-Line
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 229/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
229
b. Theminussignin−1.02makesnodifferenceintheprocedure;thetableisusedinexactlythe
samewayasinpart(a).Thenumberintheintersectionoftherowwithheading−1.0andthe
columnwithheading0.02is0.1539.Thismeansthat P ( Z <−1.02)= P ( Z ≤−1.02)=0.1539,hence
P ( Z >−1.02)=1− P ( Z ≤−1.02)=1−0.1539=0.8461
Figure5.12ComputingaProbabilityforanIntervalofFiniteLength
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 230/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
230
b. Theprocedureforfindingtheprobabilitythat Z takesavalueinafiniteintervalwhoseendpoints
haveoppositesignsisexactlythesameprocedureusedinpart(a),andisillustratedinFigure5.13
"ComputingaProbabilityforanIntervalofFiniteLength".Insymbolsthecomputationis
P (−2.55< Z <0.09)== P ( Z <0.09)− P ( Z <−2.55)
=0.5359−0.0054=0.5305
Figure5.13ComputingaProbabilityforanIntervalofFiniteLength
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 231/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
231
The next example shows what to do if the value of Z that we want to look up in the table is not
present there.
E X A M P L E 7
Findtheprobabilitiesindicated.
a. P (1.13< Z <4.16).
b. P (−5.22< Z <2.15).
Solution:
a. WeattempttocomputetheprobabilityexactlyasinNote5.20"Example6"bylookingupthe
numbers1.13and4.16inthetable.Weobtainthevalue0.8708fortheareaoftheregionunderthe
densitycurvetoleftof1.13withoutanyproblem,butwhenwegotolookupthenumber4.16inthe
table,itisnotthere.Wecanseefromthelastrowofnumbersinthetablethattheareatotheleftof4.16
mustbesocloseto1thattofourdecimalplacesitroundsto1.0000.Therefore
P (1.13< Z <4.16)=1.0000−0.8708=0.1292
b. Similarly,herewecanreaddirectlyfromthetablethattheareaunderthedensitycurveand
totheleftof2.15is0.9842,but−5.22istoofartotheleftonthenumberlinetobeinthetable.We
canseefromthefirstlineofthetablethattheareatotheleftof−5.22mustbesocloseto0thatto
fourdecimalplacesitroundsto0.0000.Therefore
P (−5.22< Z <2.15)=0.9842−0.0000=0.9842
The final example of this section explains the origin of the proportions given in the Empirical Rule.
E X A M P L E 8
Findtheprobabilitiesindicated.
a. P (−1< Z <1).
b. P (−2< Z <2).
c. P (−3< Z <3).
Solution:
a. UsingthetableaswasdoneinNote5.20"Example6"(b)weobtain
P (−1< Z <1)=0.8413−0.1587=0.6826
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 232/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
232
Since Z hasmean0andstandarddeviation1,for Z totakeavaluebetween−1and1means
that Z takesavaluethatiswithinonestandarddeviationofthemean.Ourcomputationshows
thattheprobabilitythatthishappensisabout0.68,theproportiongivenbytheEmpiricalRulefor
histogramsthataremoundshapedandsymmetrical,likethebellcurve.
b. Usingthetableinthesameway,
P (−2< Z <2)=0.9772−0.0228=0.9544
Thiscorrespondstotheproportion0.95fordatawithintwostandarddeviationsofthemean.
c. Similarly,
P (−3< Z <3)=0.9987−0.0013=0.9974
whichcorrespondstotheproportion0.997fordatawithinthreestandarddeviationsofthemean.
K E Y T A K E A W A Y S
• Astandardnormalrandomvariable Z isanormallydistributedrandomvariablewithmean μ=0and
standarddeviationσ =1.
• ProbabilitiesforastandardnormalrandomvariablearecomputedusingFigure12.2"CumulativeNormal
Probability".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 233/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
233
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 234/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
234
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 235/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 236/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
236
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 237/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
237
5.3ProbabilityComputationsforGeneralNormalRandomVariables
L E A R N I N G O B J E C T I V E
1. Tolearnhowtocomputeprobabilitiesrelatedtoanynormalrandomvariable.
If X is any normally distributed normal random variable then Figure 12.2 "Cumulative Normal
Probability" can also be used to compute a probability of the form P (a< X <b) by means of the following
equality.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 238/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
238
The new endpoints (a− µ)/σ and (b− µ)/σ are the z -scores of a and b as defined in Section 2.4.2 in Chapter
2 "Descriptive Statistics".
Figure 5.14 "Probability for an Interval of Finite Length" illustrates the meaning of the equality
geometrically: the two shaded regions, one under the density curve for X and the other under the
density curve for Z , have the same area. Instead of drawing both bell curves, though, we will always
draw a single generic bell-shaped curve with both an x -axis and a z -axis below it.
Figure 5.14 Probability for an Interval of Finite Length
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 239/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
239
E X A M P L E 9
Let X beanormalrandomvariablewithmean μ=10andstandarddeviationσ =2.5.Computethe
followingprobabilities.
a. P( X <14).
b. P (8< X <14).
Solution:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 240/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
240
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 241/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
241
E X A M P L E 1 0
Thelifetimesofthetreadofacertainautomobiletirearenormallydistributedwithmean37,500
milesandstandarddeviation4,500miles.Findtheprobabilitythatthetreadlifeofarandomly
selectedtirewillbebetween30,000and40,000miles.
Solution:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 242/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
242
Let X denotethetreadlifeofarandomlyselectedtire.Tomakethenumberseasiertoworkwithwe
willchoosethousandsofmilesastheunits.Thus μ=37.5,σ =4.5,andtheproblemisto
compute P (30< X <40).Figure5.17"ProbabilityComputationforTireTreadWear" illustratesthe
followingcomputation:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 243/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
243
E X A M P L E 1 1
Scoresonastandardizedcollegeentranceexamination( CEE )arenormallydistributedwithmean510
andstandarddeviation60.Aselectiveuniversityconsidersforadmissiononlyapplicants
withCEE scoresover650.Findpercentageofallindividualswhotookthe CEE whomeetthe
university'sCEErequirementforconsiderationforadmission.
Solution:
Let X denotethescoremadeontheCEE byarandomlyselectedindividual.Then X isnormally
distributedwithmean510andstandarddeviation60.Theprobabilitythat X lieinaparticularinterval
isthesameastheproportionofallexamscoresthatlieinthatinterval.Thusthesolutiontothe
problemisP( X >650),expressedasapercentage. Figure5.18"ProbabilityComputationforExam
Scores"illustratesthefollowingcomputation:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 244/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
244
K E Y T A K E A W A Y • ProbabilitiesforageneralnormalrandomvariablearecomputedusingFigure12.2"CumulativeNormal
Probability"afterconverting x -valuestoz-scores.
E X E R C I S E S
B A S I C
1. X isanormallydistributedrandomvariablewithmean57andstandarddeviation6.Findtheprobability
indicated.
a. P( X <59.5)
b. P( X <46.2)
c. P( X >52.2)
d. P( X >70)
2. X isanormallydistributedrandomvariablewithmean−25andstandarddeviation4.Findtheprobability
indicated.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 245/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 246/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
246
b. P( X <75),P( X >125)
c. P( X <84.55),P( X >115.45)
d. P( X <77.42),P( X >122.58)
9. X isanormallydistributedrandomvariablewithmean67andstandarddeviation13.Theprobability
that X takesavalueintheunionofintervals(−∞,67−a] ∪ [67+a,∞)willbe
denoted P ( X ≤67−a or X ≥67+a).UseFigure12.2"CumulativeNormalProbability"tofindthefollowing
probabilitiesofthistype.Sketchthedensitycurvewithrelevantregionsshadedtoillustratethe
computation.BecauseofthesymmetryofthedensitycurveyouneedtouseFigure12.2"Cumulative
NormalProbability"onlyonetimeforeachpart.
a. P ( X <57 or X >77)
b. P ( X <47 or X >87)
c. P ( X <49 or X >85)
d. P ( X <37 or X >97)
10. X isanormallydistributedrandomvariablewithmean288andstandarddeviation6.Theprobability
that X takesavalueintheunionofintervals(−∞,288−a] ∪ [288+a,∞)willbe
denoted P ( X ≤288−a or X ≥288+a).UseFigure12.2"CumulativeNormalProbability"tofindthefollowing
probabilitiesofthistype.Sketchthedensitycurvewithrelevantregionsshadedtoillustratethe
computation.BecauseofthesymmetryofthedensitycurveyouneedtouseFigure12.2"Cumulative
NormalProbability"onlyonetimeforeachpart.
a. P ( X <278 or X >298)
b. P ( X <268 or X >308)
c. P ( X <273 or X >303)
d. P ( X <280 or X >296)
A P P L I C A T I O N S
11. Theamount X ofbeverageinacanlabeled12ouncesisnormallydistributedwithmean12.1ouncesand
standarddeviation0.05ounce.Acanisselectedatrandom.
a. Findtheprobabilitythatthecancontainsatleast12ounces.
b.
Findtheprobabilitythatthecancontainsbetween11.9and12.1ounces.12. Thelengthofgestationforswineisnormallydistributedwithmean114daysandstandarddeviation0.75
day.Findtheprobabilitythatalitterwillbebornwithinonedayofthemeanof114.
13. Thesystolicbloodpressure X ofadultsinaregionisnormallydistributedwithmean112mmHgandstandard
deviation15mmHg.Apersonisconsidered“prehypertensive”ifhissystolicbloodpressureisbetween120
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 247/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
247
and130mmHg.Findtheprobabilitythatthebloodpressureofarandomlyselectedpersonis
prehypertensive.
14. Heights X ofadultwomenarenormallydistributedwithmean63.7inchesandstandarddeviation2.71
inches.Romeo,whois69.25inchestall,wishestodateonlywomenwhoareshorterthanhebutwithin4
inchesofhisheight.Findtheprobabilitythatthenextwomanhemeetswillhavesuchaheight.
15. Heights X ofadultmenarenormallydistributedwithmean69.1inchesandstandarddeviation2.92inches.
Juliet,whois63.25inchestall,wishestodateonlymenwhoaretallerthanshebutwithin6inchesofher
height.Findtheprobabilitythatthenextmanshemeetswillhavesuchaheight.
16. Aregulationhockeypuckmustweighbetween5.5and6ounces.Theweights X ofpucksmadebya
particularprocessarenormallydistributedwithmean5.75ouncesandstandarddeviation0.11ounce.
Findtheprobabilitythatapuckmadebythisprocesswillmeettheweightstandard.
17.
Aregulationgolfballmaynotweighmorethan1.620ounces.Theweights X ofgolfballsmadebya
particularprocessarenormallydistributedwithmean1.361ouncesandstandarddeviation0.09ounce.
Findtheprobabilitythatagolfballmadebythisprocesswillmeettheweightstandard.
18. ThelengthoftimethatthebatteryinHippolyta'scellphonewillholdenoughchargetooperate
acceptablyisnormallydistributedwithmean25.6hoursandstandarddeviation0.32hour.Hippolyta
forgottochargeherphoneyesterday,sothatatthemomentshefirstwishestouseittodayithasbeen
26hours18minutessincethephonewaslastfullycharged.Findtheprobabilitythatthephonewill
operateproperly.
19. Theamountofnon-mortgagedebtperhouseholdforhouseholdsinaparticularincomebracketinone
partofthecountryisnormallydistributedwithmean$28,350andstandarddeviation$3,425.Findthe
probabilitythatarandomlyselectedsuchhouseholdhasbetween$20,000and$30,000innon-mortgage
debt.
20. Birthweightsoffull-termbabiesinacertainregionarenormallydistributedwithmean7.125lband
standarddeviation1.290lb.Findtheprobabilitythatarandomlyselectednewbornwillweighlessthan
5.5lb,thehistoricdefinitionofprematurity.
21. Thedistancefromtheseatbacktothefrontofthekneesofseatedadultmalesisnormallydistributed
withmean23.8inchesandstandarddeviation1.22inches.Thedistancefromtheseatbacktothebackof
thenextseatforwardinallseatsonaircraftflownbyabudgetairlineis26inches.Findtheproportionof
adultmenflyingwiththisairlinewhosekneeswilltouchthebackoftheseatinfrontofthem.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 248/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
248
22. Thedistancefromtheseattothetopoftheheadofseatedadultmalesisnormallydistributedwithmean
36.5inchesandstandarddeviation1.39inches.Thedistancefromtheseattotheroofofaparticular
makeandmodelcaris40.5inches.Findtheproportionofadultmenwhowhensittinginthiscarwillhave
atleastoneinchofheadroom(distancefromthetopoftheheadtotheroof).A D D I T I O N A L E X E R C I S E S
23. Theusefullifeofaparticularmakeandtypeofautomotivetireisnormallydistributedwithmean57,500miles
andstandarddeviation950miles.
a. Findtheprobabilitythatsuchatirewillhaveausefullifeofbetween57,000and58,000miles.
b. Hamletbuysfoursuchtires.Assumingthattheirlifetimesareindependent,findtheprobability
thatallfourwilllastbetween57,000and58,000miles.(Ifso,thebesttirewillhavenomorethan
1,000milesleftonitwhenthefirsttirefails.)Hint:Thereisabinomialrandomvariablehere,
whosevalueof pcomesfrompart(a).
24. Amachineproduceslargefastenerswhoselengthmustbewithin0.5inchof22inches.Thelengthsare
normallydistributedwithmean22.0inchesandstandarddeviation0.17inch.
a. Findtheprobabilitythatarandomlyselectedfastenerproducedbythemachinewillhavean
acceptablelength.
b. Themachineproduces20fastenersperhour.Thelengthofeachoneisinspected.Assuming
lengthsoffastenersareindependent,findtheprobabilitythatall20willhaveacceptablelength.
Hint:Thereisabinomialrandomvariablehere,whosevalueof pcomesfrompart(a).
25. Thelengthsoftimetakenbystudentsonanalgebraproficiencyexam(ifnotforcedtostopbeforecompleting
it)arenormallydistributedwithmean28minutesandstandarddeviation1.5minutes.
a. Findtheproportionofstudentswhowillfinishtheexamifa30-minutetimelimitisset.
b. Sixstudentsaretakingtheexamtoday.Findtheprobabilitythatallsixwillfinishtheexamwithin
the30-minutelimit,assumingthattimestakenbystudentsareindependent.Hint:Thereisa
binomialrandomvariablehere,whosevalueof pcomesfrompart(a).
26. Heightsofadultmenbetween18and34yearsofagearenormallydistributedwithmean69.1inchesand
standarddeviation2.92inches.Onerequirementforenlistmentinthemilitaryisthatmenmuststand
between60and80inchestall.
a. Findtheprobabilitythatarandomlyelectedmanmeetstheheightrequirementformilitary
service.
b. Twenty-threemenindependentlycontactarecruiterthisweek.Findtheprobabilitythatallof
themmeettheheightrequirement.Hint:Thereisabinomialrandomvariablehere,whosevalue
of pcomesfrompart(a).
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 249/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 250/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
250
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 251/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
251
5.4AreasofTailsofDistributions
L E A R N I N G O B J E C T I V E
1. Tolearnhowtofind,foranormalrandomvariable X andanareaa,thevaluex*of X sothat P ( X <x*)=aor
that P ( X >x*)=a,whicheverisrequired.
DefinitionThe left tail of a density curve y= f ( x) of a continuous random variable Xcut off by a
value x* of X is the region under the curve that is to the left of x*, as shown by the shading
in Figure 5.19 "Right and Left Tails of a Distribution" (a). The right tail cut off by x* is defined
similarly, as indicated by the shading in Figure 5.19 "Right and Left Tails of a Distribution" (b).
Figure 5.19 Right and Left Tails of a Distribution
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 252/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
252
The probabilities tabulated in Figure 12.2 "Cumulative Normal Probability" are areas of left tails in
the standard normal distribution.
TailsoftheStandardNormalDistribution
At times it is important to be able to solve the kind of problem illustrated by Figure 5.20. We have a
certain specific area in mind, in this case the area 0.0125 of the shaded region in the figure, and we want
to find the value z* of Z that produces it. This is exactly the reverse of the kind of problems encountered so
far. Instead of knowing a value z* of Z and finding a corresponding area, we know the area and want to
find z*. In the case at hand, in the terminology of the definition just above, we wish to find the
value z* that cuts off a left tail of area 0.0125 in the standard normal distribution.
The idea for solving such a problem is fairly simple, although sometimes its implementation can be a bit
complicated. In a nutshell, one reads the cumulative probability table for Z in reverse, looking up the
relevant area in the interior of the table and reading off the value of Z from the margins.
Figure 5.20 Z Value that Produces a Known Area
E X A M P L E 1 2
Findthevaluez*of Z asdeterminedbyFigure5.20:thevaluez*thatcutsoffalefttailofarea0.0125
inthestandardnormaldistribution.Insymbols,findthenumber z*suchthat P ( Z <z*)=0.0125.
Solution:
Thenumberthatisknown,0.0125,istheareaofalefttail,andasalreadymentionedthe
probabilitiestabulatedinFigure12.2"CumulativeNormalProbability"areareasoflefttails.Thusto
solvethisproblemweneedonlysearchintheinteriorof Figure12.2"CumulativeNormal
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 253/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
253
Probability"forthenumber0.0125.Itliesintherowwiththeheading−2.2andinthecolumnwith
theheading0.04.ThismeansthatP( Z <−2.24)=0.0125,hence z*=−2.24.
E X A M P L E 1 3
Findthevaluez*of Z asdeterminedbyFigure5.21:thevaluez*thatcutsoffarighttailofarea0.0250
inthestandardnormaldistribution.Insymbols,findthenumber z*suchthat P ( Z >z*)=0.0250.
Fiigure5.21 Z ValuethatProducesaKnownArea
Solution:
Theimportantdistinctionbetweenthisexampleandthepreviousoneisthathereitistheareaof
aright tailthatisknown.Inordertobeabletouse Figure12.2"CumulativeNormalProbability"we
mustfirstfindthatareaofthe left tailcutoffbytheunknownnumber z*.Sincethetotalareaunder
thedensitycurveis1,thatareais 1−0.0250=0.9750.Thisisthenumberwelookforintheinteriorof Figure
12.2"CumulativeNormalProbability".Itliesintherowwiththeheading1.9andinthecolumnwith
theheading0.06.Therefore z*=1.96.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 254/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
254
DefinitionThe value of the standard normal random variable Z that cuts off a right tail of area c is denoted z c. By
symmetry, value of Z that cuts off a left tail of area c is − z c. See Figure 5.22 "The Numbers " .
Figure 5.22The Numbers z c and − z c
E X A M P L E 1 4
Find z .01and− z .01,thevaluesof Z thatcutoffrightandlefttailsofarea0.01inthestandardnormal
distribution.
Solution:
Since− z .01cutsoffalefttailofarea0.01and Figure12.2"CumulativeNormalProbability"isatableof
lefttails,welookforthenumber0.0100intheinteriorofthetable.Itisnotthere,butfallsbetween
thetwonumbers0.0102and0.0099intherowwithheading−2.3.Thenumber0.0099iscloserto0.0100than0.0102is,soforthehundredthsplacein − z .01weusetheheadingofthecolumnthat
contains0.0099,namely,0.03,andwrite − z .01≈−2.33.
Theanswertothesecondhalfoftheproblemisautomatic:since − z .01=−2.33,weconcludeimmediately
that z .01=2.33.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 255/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
255
Wecouldjustaswellhavesolvedthisproblembylookingfor z .01first,anditisinstructivetorework
theproblemthisway.Tobeginwith,wemustfirstsubtract0.01from1tofindthe
area1−0.0100=0.9900oftheleft tailcutoffbytheunknownnumber z .01.SeeFigure5.23"Computationof
theNumber".Thenwesearchforthearea0.9900in Figure12.2"CumulativeNormalProbability".It
isnotthere,butfallsbetweenthenumbers0.9898and0.9901intherowwithheading2.3.Since
0.9901iscloserto0.9900than0.9898is,weusethecolumnheadingaboveit,0.03,toobtainthe
approximation z .01≈2.33.Thenfinally− z .01≈−2.33.
Figure5.23ComputationoftheNumber z .01
TailsofGeneralNormalDistributions
The problem of finding the value x* of a general normally distributed random variable X that cuts off
a tail of a specified area also arises. This problem may be solved in two steps.
Suppose X is a normally distributed random variable with mean and standard deviation . To find the
value x* of X that cuts off a left or right tail of area c in the distribution of X :
1. find the value z* of Z that cuts off a left or right tail of area c in the standard normal distribution;
2. z* is the z -score of x*; compute x* using the destandardization formula
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 256/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
256
x*= µ+z*σ
E X A M P L E 1 5
Findx*suchthat P ( X <x*)=0.9332,where X isanormalrandomvariablewithmean μ=10andstandard
deviationσ =2.5.
Solution:
Alltheideasforthesolutionareillustratedin Figure5.24"TailofaNormallyDistributedRandom
Variable".Since0.9332istheareaofalefttail,wecanfind z*simplybylookingfor0.9332inthe
interiorofFigure12.2"CumulativeNormalProbability".Itisintherowandcolumnwithheadings1.5
and0.00,hencez*=1.50.Thusx*is1.50standarddeviationsabovethemean,so
x*= µ+z*σ =10+1.50⋅2.5=13.75.
Figure5.24TailofaNormallyDistributedRandomVariable
E X A M P L E 1 6
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 257/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
257
Findx*suchthat P ( X >x*)=0.65,where X isanormalrandomvariablewithmean μ=175andstandard
deviationσ =12.
Solution:
ThesituationisillustratedinFigure5.25"TailofaNormallyDistributedRandomVariable" .Since0.65
istheareaofarighttail,wefirstsubtractitfrom1toobtain 1−0.65=0.35,theareaofthe
complementarylefttail.Wefindz*bylookingfor0.3500intheinteriorof Figure12.2"Cumulative
NormalProbability".Itisnotpresent,butliesbetweentableentries0.3520and0.3483.Theentry
0.3483withrowandcolumnheadings−0.3and0.09iscloserto0.3500thantheotherentryis,
soz*≈−0.39.Thusx*is0.39standarddeviationsbelowthemean,so
x*= µ+z*σ =175+(−0.39)⋅12=170.32
Figure5.25TailofaNormallyDistributedRandomVariable
E X A M P L E 1 7
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 258/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
258
Scoresonastandardizedcollegeentranceexamination( CEE )arenormallydistributedwithmean510
andstandarddeviation60.Aselectiveuniversitydecidestogiveseriousconsiderationforadmission
toapplicantswhoseCEEscoresareinthetop5%ofallCEE scores.Findtheminimumscorethat
meetsthiscriterionforseriousconsiderationforadmission.
Solution:
Let X denotethescoremadeontheCEE byarandomlyselectedindividual.Then X isnormally
distributedwithmean510andstandarddeviation60.Theprobabilitythat X lieinaparticularinterval
isthesameastheproportionofallexamscoresthatlieinthatinterval.Thustheminimumscorethat
isinthetop5%ofallCEE isthescorex*thatcutsoffarighttailinthedistributionof X ofarea0.05
(5%expressedasaproportion).See Figure5.26"TailofaNormallyDistributedRandomVariable" .
Figure5.26TailofaNormallyDistributedRandomVariable
Since0.0500istheareaofarighttail,wefirstsubtractitfrom1toobtain 1−0.0500=0.9500,theareaof
thecomplementarylefttail.Wefindz*= z .05
bylookingfor0.9500intheinteriorof Figure12.2"CumulativeNormalProbability".Itisnotpresent,andliesexactlyhalf-waybetweenthetwonearest
entriesthatare,0.9495and0.9505.Inthecaseofatielikethis,wewillalwaysaveragethevalues
of Z correspondingtothetwotableentries,obtainingherethevalue z*=1.645.Usingthisvalue,we
concludethatx*is1.645standarddeviationsabovethemean,so
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 259/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
259
x*= µ+z*σ =510+1.645⋅60=608.7
E X A M P L E 1 8
Allboysatamilitaryschoolmustrunafixedcourseasfastastheycanaspartofaphysical
examination.Finishingtimesarenormallydistributedwithmean29minutesandstandarddeviation2
minutes.Themiddle75%ofallfinishingtimesareclassifiedas“average.”Findtherangeoftimesthat
areaveragefinishingtimesbythisdefinition.
Solution:
Let X denotethefinishtimeofarandomlyselectedboy.Then X isnormallydistributedwithmean29
andstandarddeviation2.Theprobabilitythat X lieinaparticularintervalisthesameasthe
proportionofallfinishtimesthatlieinthatinterval.Thusthesituationisasshownin Figure5.27
"DistributionofTimestoRunaCourse" .Becausetheareainthemiddlecorrespondingto“average”
timesis0.75,theareasofthetwotailsaddupto1−0.75=0.25inall.Bythesymmetryofthedensity
curveeachtailmusthavehalfofthistotal,orarea0.125each.Thusthefastesttimethatis“average”
hasz-score− z .125,whichbyFigure12.2"CumulativeNormalProbability" is−1.15,andtheslowesttime
thatis“average”hasz-score z .125=1.15.Thefastestandslowesttimesthatarestillconsideredaverage
are
x fast= µ+(− z .125)σ =29+(−1.15)⋅2=26.7
and
x slow= µ+ z .125σ =29+(1.15)⋅2=31.3
Figure5.27 DistributionofTimestoRunaCourse
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 260/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
260
Aboyhasanaveragefinishingtimeifherunsthecoursewithatimebetween26.7and31.3minutes,
orequivalentlybetween26minutes42secondsand31minutes18seconds.
K E Y T A K E A W A Y S
• Theproblemoffindingthenumberz*sothattheprobability P ( Z <z*)isaspecifiedvalue cissolvedby
lookingforthenumbercintheinteriorofFigure12.2"CumulativeNormalProbability"andreadingz*from
themargins.
• Theproblemoffindingthenumberz*sothattheprobability P ( Z >z*)isaspecifiedvalue cissolvedby
lookingforthecomplementaryprobability1−cintheinteriorofFigure12.2"CumulativeNormal
Probability"andreadingz*fromthemargins.
• Foranormalrandomvariable X withmean μandstandarddeviationσ ,theproblemoffindingthe
numberx*sothat P ( X <x*)isaspecifiedvalue c(orsothat P ( X >x*)isaspecifiedvalue c)issolvedintwo
steps:(1)solvethecorrespondingproblemfor Z withthesamevalueofc,therebyobtainingthez-
score,z*,ofx*;(2)findx*usingx*= µ+z*⋅σ .
• Thevalueof Z thatcutsoffarighttailofareacinthestandardnormaldistributionisdenotedzc.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 261/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
261
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 262/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
262
9. X isanormallydistributedrandomvariable X withmean15andstandarddeviation0.25.Findthe
values x Land x Rof X thataresymmetricallylocatedwithrespecttothemeanof X andsatisfyP( x L< X < x R)
=0.80.(Hint.Firstsolvethecorrespondingproblemfor Z .)
10. X isanormallydistributedrandomvariable X withmean28andstandarddeviation3.7.Findthe
values x Land x Rof X thataresymmetricallylocatedwithrespecttothemeanof X andsatisfyP( x L< X < x R)=
0.65.(Hint.Firstsolvethecorrespondingproblemfor Z .)
A P P L I C A T I O N S
11. Scoresonanationalexamarenormallydistributedwithmean382andstandarddeviation26.
a. Findthescorethatisthe50thpercentile.
b. Findthescorethatisthe90thpercentile.
12. Heightsofwomenarenormallydistributedwithmean63.7inchesandstandarddeviation2.47inches.
a. Findtheheightthatisthe10thpercentile.
b. Findtheheightthatisthe80thpercentile.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 263/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
263
13. Themonthlyamountofwaterusedperhouseholdinasmallcommunityisnormallydistributedwithmean
7,069gallonsandstandarddeviation58gallons.Findthethreequartilesfortheamountofwaterused.
14. Thequantityofgasolinepurchasedinasinglesaleatachainoffillingstationsinacertainregionisnormally
distributedwithmean11.6gallonsandstandarddeviation2.78gallons.Findthethreequartilesforthe
quantityofgasolinepurchasedinasinglesale.
15. Scoresonthecommonfinalexamgiveninalargeenrollmentmultiplesectioncoursewerenormally
distributedwithmean69.35andstandarddeviation12.93.Thedepartmenthastherulethatinorderto
receiveanAinthecoursehisscoremustbeinthetop10%ofallexamscores.Findtheminimumexamscore
thatmeetsthisrequirement.
16. Theaveragefinishingtimeamongallhighschoolboysinaparticulartrackeventinacertainstateis5minutes
17seconds.Timesarenormallydistributedwithstandarddeviation12seconds.
a. Thequalifyingtimeinthiseventforparticipationinthestatemeetistobesetsothatonlythe
fastest5%ofallrunnersqualify.Findthequalifyingtime.(Hint:Convertsecondstominutes.)
b. Inthewesternregionofthestatethetimesofallboysrunninginthiseventarenormally
distributedwithstandarddeviation12seconds,butwithmean5minutes22seconds.Findthe
proportionofboysfromthisregionwhoqualifytoruninthiseventinthestatemeet.
17. Testsofanewtiredevelopedbyatiremanufacturerledtoanestimatedmeantreadlifeof67,350miles
andstandarddeviationof1,120miles.Themanufacturerwilladvertisethelifetimeofthetire(for
example,a“50,000miletire”)usingthelargestvalueforwhichitisexpectedthat98%ofthetireswilllast
atleastthatlong.Assumingtirelifeisnormallydistributed,findthatadvertisedvalue.
18. Testsofanewlightledtoanestimatedmeanlifeof1,321hoursandstandarddeviationof106hours.The
manufacturerwilladvertisethelifetimeofthebulbusingthelargestvalueforwhichitisexpectedthat
90%ofthebulbswilllastatleastthatlong.Assumingbulblifeisnormallydistributed,findthatadvertised
value.
19. Theweights X ofeggsproducedataparticularfarmarenormallydistributedwithmean1.72ouncesand
standarddeviation0.12ounce.Eggswhoseweightslieinthemiddle75%ofthedistributionofweightsof
alleggsareclassifiedas“medium.”Findthemaximumandminimumweightsofsucheggs.(Theseweights
areendpointsofanintervalthatissymmetricaboutthemeanandinwhichtheweightsof75%ofthe
eggsproducedatthisfarmlie.)
20. Thelengths X ofhardwoodflooringstripsarenormallydistributedwithmean28.9inchesandstandard
deviation6.12inches.Stripswhoselengthslieinthemiddle80%ofthedistributionoflengthsofallstrips
areclassifiedas“average-lengthstrips.”Findthemaximumandminimumlengthsofsuchstrips.(These
lengthsareendpointsofanintervalthatissymmetricaboutthemeanandinwhichthelengthsof80%of
thehardwoodstripslie.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 264/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
264
21. Allstudentsinalargeenrollmentmultiplesectioncoursetakecommonin-classexamsandacommon
final,andsubmitcommonhomeworkassignments.Coursegradesareassignedbasedonstudents'final
overallscores,whichareapproximatelynormallydistributed.ThedepartmentassignsaCtostudents
whosescoresconstitutethemiddle2/3ofallscores.Ifscoresthissemesterhadmean72.5andstandard
deviation6.14,findtheintervalofscoresthatwillbeassignedaC.
22. Researcherswishtoinvestigatetheoverallhealthofindividualswithabnormallyhighorlowlevelsof
glucoseinthebloodstream.Supposeglucoselevelsarenormallydistributedwithmean96andstandard
deviation8.5mg/dℓ,andthat“normal”isdefinedasthemiddle90%ofthepopulation.Findtheinterval
ofnormalglucoselevels,thatis,theintervalcenteredat96thatcontains90%ofallglucoselevelsinthe
population.
A D D I T I O N A L E X E R C I S E S
23. Amachineforfilling2-literbottlesofsoftdrinkdeliversanamounttoeachbottlethatvariesfrombottleto
bottleaccordingtoanormaldistributionwithstandarddeviation0.002literandmeanwhateveramountthemachineissettodeliver.
a. Ifthemachineissettodeliver2liters(sothemeanamountdeliveredis2liters)whatproportion
ofthebottleswillcontainatleast2litersofsoftdrink?
b. Findtheminimumsettingofthemeanamountdeliveredbythemachinesothatatleast99%of
allbottleswillcontainatleast2liters.
24. Anurseryhasobservedthatthemeannumberofdaysitmustdarkentheenvironmentofaspeciespoinsettia
plantdailyinordertohaveitreadyformarketis71days.Supposethelengthsofsuchperiodsofdarkening
arenormallydistributedwithstandarddeviation2days.Findthenumberofdaysinadvanceoftheprojected
deliverydatesoftheplantstomarketthatthenurserymustbeginthedailydarkeningprocessinorderthatat
least95%oftheplantswillbereadyontime.(Poinsettiasaresolong-livedthatoncereadyformarketthe
plantremainssalableindefinitely.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 265/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
265
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 266/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
266
Chapter6
SamplingDistributions A statistic, such as the sample mean or the sample standard deviation, is a number computed from a
sample. Since a sample is random, every statistic is a random variable: it varies from sample to
sample in a way that cannot be predicted with certainty. As a random variable it has a mean, a
standard deviation, and a probability distribution. The probability distribution of a statistic is called
itssampling distribution. Typically sample statistics are not ends in themselves, but are computed in
order to estimate the corresponding population parameters, as illustrated in the grand picture of
statistics presented in Figure 1.1 "The Grand Picture of Statistics" in Chapter 1 "Introduction".
This chapter introduces the concepts of the mean, the standard deviation, and the sampling
distribution of a sample statistic, with an emphasis on the sample mean x^ −.
6.1TheMeanandStandardDeviationoftheSampleMean
L E A R N I N G O B J E C T I V E S
1. Tobecomefamiliarwiththeconceptoftheprobabilitydistributionofthesamplemean.
2. Tounderstandthemeaningoftheformulasforthemeanandstandarddeviationofthesamplemean.
Suppose we wish to estimate the mean of a population. In actual practice we would typically take
just one sample. Imagine however that we take sample after sample, all of the same size n, and
compute the sample mean x^ − of each one. We will likely get a different value of x^ − each time. The
sample mean x^ − is a random variable: it varies from sample to sample in a way that cannot be
predicted with certainty. We will write X^ −− when the sample mean is thought of as a random variable,
and write x^ − for the values that it takes. The random variable X^ −− has a mean, denoted µ X^ −−, and
a standard deviation, denoted σ X^ −−. Here is an example with such a small population and small
sample size that we can actually write down every single sample.
E X A M P L E 1
Arowingteamconsistsoffourrowerswhoweigh152,156,160,and164pounds.Findallpossiblerandomsampleswithreplacementofsizetwoandcomputethesamplemeanforeachone.Use
themtofindtheprobabilitydistribution,themean,andthestandarddeviationofthesample
mean X^ −−.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 267/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
267
Solution
Thefollowingtableshowsallpossiblesampleswithreplacementofsizetwo,alongwiththemeanof
each:
Sample Mean Sample Mean Sample Mean Sample Mean
152,152 152 156,152 154 160,152 156 164,152 158
152,156 154 156,156 156 160,156 158 164,156 160
152,160 156 156,160 158 160,160 160 164,160 162
152,164 158 156,164 160 160,164 162 164,164 164
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 268/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
268
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 269/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
269
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 270/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
270
K E Y T A K E A W A Y S
• Thesamplemeanisarandomvariable;assuchitiswritten X −−,and x−standsforindividualvaluesittakes.
• Asarandomvariablethesamplemeanhasaprobabilitydistribution,amean µ X −−,andastandard
deviationσ X −−.
• Thereareformulasthatrelatethemeanandstandarddeviationofthesamplemeantothemeanand
standarddeviationofthepopulationfromwhichthesampleisdrawn.
E X E R C I S E S
1. Randomsamplesofsize225aredrawnfromapopulationwithmean100andstandarddeviation20.Findthe
meanandstandarddeviationofthesamplemean.
2. Randomsamplesofsize64aredrawnfromapopulationwithmean32andstandarddeviation5.Findthe
meanandstandarddeviationofthesamplemean.
3. Apopulationhasmean75andstandarddeviation12.
a. Randomsamplesofsize121aretaken.Findthemeanandstandarddeviationofthesample
mean.
b. Howwouldtheanswerstopart(a)changeifthesizeofthesampleswere400insteadof121?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 271/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
271
4. Apopulationhasmean5.75andstandarddeviation1.02.
a. Randomsamplesofsize81aretaken.Findthemeanandstandarddeviationofthesamplemean.
b. Howwouldtheanswerstopart(a)changeifthesizeofthesampleswere25insteadof81?
6.2TheSamplingDistributionoftheSampleMean
L E A R N I N G O B J E C T I V E S
1. Tolearnwhatthesamplingdistributionof X^ −−iswhenthesamplesizeislarge.
2. Tolearnwhatthesamplingdistributionof X^ −−iswhenthepopulationisnormal.
TheCentralLimitTheorem
In Note 6.5 "Example 1" in Section 6.1 "The Mean and Standard Deviation of the Sample Mean" we
constructed the probability distribution of the sample mean for samples of size two drawn from the
population of four rowers. The probability distribution is:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 272/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
272
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 273/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
273
Histograms illustrating these distributions are shown in Figure 6.2 "Distributions of the Sample
Mean".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 274/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
274
Figure 6.2 Distributions of the Sample Mean
As n increases the sampling distribution of X^ −− evolves in an interesting way: the probabilities on the
lower and the upper ends shrink and the probabilities in the middle become larger in relation to
them. If we were to continue to increase nthen the shape of the sampling distribution would become
smoother and more bell-shaped.
What we are seeing in these examples does not depend on the particular population distributions
involved. In general, one may start with any distribution and the sampling distribution of the sample
mean will increasingly resemble the bell-shaped normal curve as the sample size increases. This is
the content of the Central Limit Theorem.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 275/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 276/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
276
The importance of the Central Limit Theorem is that it allows us to make probability statements
about the sample mean, specifically in relation to its value in comparison to the population mean, as
we will see in the examples. But to use the result properly we must first realize that there are two
separate random variables (and therefore two probability distributions) at play:
1. X , the measurement of a single element selected at random from the population; the distribution of X is
the distribution of the population, with mean the population mean and standard deviation the
population standard deviation ;
2. X −−, the mean of the measurements in a sample of size n; the distribution of X −−is its sampling
distribution, with mean µ X −−= µ and standard deviation σ X −−=σ /n√.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 277/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
277
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 278/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
278
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 279/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
279
NormallyDistributedPopulations
The Central Limit Theorem says that no matter what the distribution of the population is, as long as
the sample is “large,” meaning of size 30 or more, the sample mean is approximately normally
distributed. If the population is normal to begin with then the sample mean also has a normal
distribution, regardless of the sample size.
For samples of any size drawn from a normally distributed population, the sample mean is normally
distributed, with mean µ X^ −−= µ and standard deviation σ X −−=σ /√n, where n is the sample size.
The effect of increasing the sample size is shown in Figure 6.4 "Distribution of Sample Means for a
Normal Population".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 280/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
280
Figure 6.4 Distribution of Sample Means for a Normal Population
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 281/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
281
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 282/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
282
K E Y T A K E A W A Y S
• Whenthesamplesizeisatleast30thesamplemeanisnormallydistributed.
• Whenthepopulationisnormalthesamplemeanisnormallydistributedregardlessofthesamplesize.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 283/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
283
E X E R C I S E S
B A S I C
1. Apopulationhasmean128andstandarddeviation22.
a. Findthemeanandstandarddeviationof X −−forsamplesofsize36.
b. Findtheprobabilitythatthemeanofasampleofsize36willbewithin10unitsofthepopulation
mean,thatis,between118and138.
2. Apopulationhasmean1,542andstandarddeviation246.
a. Findthemeanandstandarddeviationof X −−forsamplesofsize100.
b. Findtheprobabilitythatthemeanofasampleofsize100willbewithin100unitsofthe
populationmean,thatis,between1,442and1,642.
3. Apopulationhasmean73.5andstandarddeviation2.5.
a. Findthemeanandstandarddeviationof X −−forsamplesofsize30.
b. Findtheprobabilitythatthemeanofasampleofsize30willbelessthan72.
4. Apopulationhasmean48.4andstandarddeviation6.3.
a. Findthemeanandstandarddeviationof X −−forsamplesofsize64.
b. Findtheprobabilitythatthemeanofasampleofsize64willbelessthan46.7.
5. Anormallydistributedpopulationhasmean25.6andstandarddeviation3.3.
a. Findtheprobabilitythatasinglerandomlyselectedelement X ofthepopulationexceeds30.
b. Findthemeanandstandarddeviationof X −−forsamplesofsize9.
c. Findtheprobabilitythatthemeanofasampleofsize9drawnfromthispopulationexceeds30.
6. Anormallydistributedpopulationhasmean57.7andstandarddeviation12.1.
a. Findtheprobabilitythatasinglerandomlyselectedelement X ofthepopulationislessthan45.
b. Findthemeanandstandarddeviationof X −−forsamplesofsize16.
c. Findtheprobabilitythatthemeanofasampleofsize16drawnfromthispopulationislessthan
45.
7. Apopulationhasmean557andstandarddeviation35.
a. Findthemeanandstandarddeviationof X −−forsamplesofsize50.
b. Findtheprobabilitythatthemeanofasampleofsize50willbemorethan570.
8. Apopulationhasmean16andstandarddeviation1.7.
a. Findthemeanandstandarddeviationof X −−forsamplesofsize80.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 284/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
284
b. Findtheprobabilitythatthemeanofasampleofsize80willbemorethan16.4.
9. Anormallydistributedpopulationhasmean1,214andstandarddeviation122.
a. Findtheprobabilitythatasinglerandomlyselectedelement X ofthepopulationisbetween
1,100and1,300.
b. Findthemeanandstandarddeviationof X −−forsamplesofsize25.
c. Findtheprobabilitythatthemeanofasampleofsize25drawnfromthispopulationisbetween
1,100and1,300.
10. Anormallydistributedpopulationhasmean57,800andstandarddeviation750.
a. Findtheprobabilitythatasinglerandomlyselectedelement X ofthepopulationisbetween
57,000and58,000.
b. Findthemeanandstandarddeviationof X −−forsamplesofsize100.
c. Findtheprobabilitythatthemeanofasampleofsize100drawnfromthispopulationisbetween
57,000and58,000.
11. Apopulationhasmean72andstandarddeviation6.
a. Findthemeanandstandarddeviationof X −−forsamplesofsize45.
b. Findtheprobabilitythatthemeanofasampleofsize45willdifferfromthepopulationmean72
byatleast2units,thatis,iseitherlessthan70ormorethan74.(Hint:Onewaytosolvethe
problemistofirstfindtheprobabilityofthecomplementaryevent.)
12. Apopulationhasmean12andstandarddeviation1.5.
a. Findthemeanandstandarddeviationof X −−forsamplesofsize90.
b. Findtheprobabilitythatthemeanofasampleofsize90willdifferfromthepopulationmean12
byatleast0.3unit,thatis,iseitherlessthan11.7ormorethan12.3.(Hint:Onewaytosolvethe
problemistofirstfindtheprobabilityofthecomplementaryevent.)
A P P L I C A T I O N S
13. Supposethemeannumberofdaystogerminationofavarietyofseedis22,withstandarddeviation2.3days.
Findtheprobabilitythatthemeangerminationtimeofasampleof160seedswillbewithin0.5dayofthe
populationmean.
14. Supposethemeanlengthoftimethatacallerisplacedonholdwhentelephoningacustomerservicecenter
is23.8seconds,withstandarddeviation4.6seconds.Findtheprobabilitythatthemeanlengthoftimeon
holdinasampleof1,200callswillbewithin0.5secondofthepopulationmean.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 285/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
285
15. Supposethemeanamountofcholesterolineggslabeled“large”is186milligrams,withstandarddeviation7
milligrams.Findtheprobabilitythatthemeanamountofcholesterolinasampleof144eggswillbewithin2
milligramsofthepopulationmean.
16. Supposethatinoneregionofthecountrythemeanamountofcreditcarddebtperhouseholdinhouseholds
havingcreditcarddebtis$15,250,withstandarddeviation$7,125.Findtheprobabilitythatthemean
amountofcreditcarddebtinasampleof1,600suchhouseholdswillbewithin$300ofthepopulationmean.
17. Supposespeedsofvehiclesonaparticularstretchofroadwayarenormallydistributedwithmean36.6mph
andstandarddeviation1.7mph.
a. Findtheprobabilitythatthespeed X ofarandomlyselectedvehicleisbetween35and40mph.
b. Findtheprobabilitythatthemeanspeed X −−of20randomlyselectedvehiclesisbetween35and
40mph.
18. Manysharksenterastateoftonicimmobilitywheninverted.Supposethatinaparticularspeciesofsharks
thetimeasharkremainsinastateoftonicimmobilitywheninvertedisnormallydistributedwithmean11.2
minutesandstandarddeviation1.1minutes.
a. Ifabiologistinducesastateoftonicimmobilityinsuchasharkinordertostudyit,findthe
probabilitythatthesharkwillremaininthisstateforbetween10and13minutes.
b. Whenabiologistwishestoestimatethemeantimethatsuchsharksstayimmobilebyinducing
tonicimmobilityineachofasampleof12sharks,findtheprobabilitythatmeantimeof
immobilityinthesamplewillbebetween10and13minutes.
19. Supposethemeancostacrossthecountryofa30-daysupplyofagenericdrugis$46.58,withstandard
deviation$4.84.Findtheprobabilitythatthemeanofasampleof100pricesof30-daysuppliesofthisdrug
willbebetween$45and$50.
20. Supposethemeanlengthoftimebetweensubmissionofastatetaxreturnrequestingarefundandthe
issuanceoftherefundis47days,withstandarddeviation6days.Findtheprobabilitythatinasampleof50
returnsrequestingarefund,themeansuchtimewillbemorethan50days.
21. Scoresonacommonfinalexaminalargeenrollment,multiple-sectionfreshmancoursearenormally
distributedwithmean72.7andstandarddeviation13.1.
a. Findtheprobabilitythatthescore X onarandomlyselectedexampaperisbetween70and80.
b. Findtheprobabilitythatthemeanscore X −−of38randomlyselectedexampapersisbetween70
and80.
22. Supposethemeanweightofschoolchildren’sbookbagsis17.4pounds,withstandarddeviation2.2pounds.
Findtheprobabilitythatthemeanweightofasampleof30bookbagswillexceed17pounds.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 286/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
286
23. Supposethatinacertainregionofthecountrythemeandurationoffirstmarriagesthatendindivorceis7.8
years,standarddeviation1.2years.Findtheprobabilitythatinasampleof75divorces,themeanageofthe
marriagesisatmost8years.
24. Borachioeatsatthesamefastfoodrestauranteveryday.Supposethetime X betweenthemomentBorachio
enterstherestaurantandthemomentheisservedhisfoodisnormallydistributedwithmean4.2minutes
andstandarddeviation1.3minutes.
a. Findtheprobabilitythatwhenheenterstherestauranttodayitwillbeatleast5minutesuntilhe
isserved.
b. Findtheprobabilitythataveragetimeuntilheisservedineightrandomlyselectedvisitstothe
restaurantwillbeatleast5minutes.
A D D I T I O N A L E X E R C I S E S
25. Ahigh-speedpackingmachinecanbesettodeliverbetween11and13ouncesofaliquid.Foranydelivery
settinginthisrangetheamountdeliveredisnormallydistributedwithmeansomeamount μandwithstandarddeviation0.08ounce.Tocalibratethemachineitissettodeliveraparticularamount,many
containersarefilled,and25containersarerandomlyselectedandtheamounttheycontainismeasured.Find
theprobabilitythatthesamplemeanwillbewithin0.05ounceoftheactualmeanamountbeingdeliveredto
allcontainers.
26. Atiremanufacturerstatesthatacertaintypeoftirehasameanlifetimeof60,000miles.Supposelifetimes
arenormallydistributedwithstandarddeviationσ = 3,500miles.
a. Findtheprobabilitythatifyoubuyonesuchtire,itwilllastonly57,000orfewermiles.Ifyouhad
thisexperience,isitparticularlystrongevidencethatthetireisnotasgoodasclaimed?
b. Aconsumergroupbuysfivesuchtiresandteststhem.Findtheprobabilitythataveragelifetime
ofthefivetireswillbe57,000milesorless.Ifthemeanissolow,isthatparticularlystrong
evidencethatthetireisnotasgoodasclaimed?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 287/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
287
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 288/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
288
6.3TheSampleProportion
L E A R N I N G O B J E C T I V E S
1. Torecognizethatthesampleproportion P ̂ isarandomvariable.
2. Tounderstandthemeaningoftheformulasforthemeanandstandarddeviationofthesample
proportion.
3. Tolearnwhatthesamplingdistributionof P ̂ iswhenthesamplesizeislarge.
Often sampling is done in order to estimate the proportion of a population that has a specific
characteristic, such as the proportion of all items coming off an assembly line that are defective or
the proportion of all people entering a retail store who make a purchase before leaving. The
population proportion is denoted p and the sample proportion is denoted pˆ. Thus if in reality 43% of
people entering a store make a purchase before leaving, p = 0.43; if in a sample of 200 people
entering the store, 78 make a purchase, pˆ=78/200=0.39.
The sample proportion is a random variable: it varies from sample to sample in a way that cannot be
predicted with certainty. Viewed as a random variable it will be written P ̂ . It has a mean µ P ̂ and
a standard deviation σ P ̂. Here are formulas for their values.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 289/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 290/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
290
Figure 6.5 "Distribution of Sample Proportions" shows that when p = 0.1 a sample of size 15 is too
small but a sample of size 100 is acceptable. Figure 6.6 "Distribution of Sample Proportions for
" shows that when p = 0.5 a sample of size 15 is acceptable.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 291/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
291
Figure 6.5 Distribution of Sample Proportions
Figure 6.6 Distribution of Sample Proportions for p = 0.5 and n = 15
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 292/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
292
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 293/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
293
E X A M P L E 8
Anonlineretailerclaimsthat90%ofallordersareshippedwithin12hoursofbeingreceived.A
consumergroupplaced121ordersofdifferentsizesandatdifferenttimesofday;102orderswere
shippedwithin12hours.
a. Computethesampleproportionofitemsshippedwithin12hours.
b. Confirmthatthesampleislargeenoughtoassumethatthesampleproportionisnormally
distributed.Use p=0.90,correspondingtotheassumptionthattheretailer’sclaimisvalid.
c. Assumingtheretailer’sclaimistrue,findtheprobabilitythatasampleofsize121would
produceasampleproportionsolowaswasobservedinthissample.
d. Basedontheanswertopart(c),drawaconclusionabouttheretailer’sclaim.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 294/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
294
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 295/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
295
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 296/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
296
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 297/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
297
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 298/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
298
A P P L I C A T I O N S
13. Supposethat8%ofallmalessuffersomeformofcolorblindness.Findtheprobabilitythatinarandom
sampleof250menatleast10%willsuffersomeformofcolorblindness.Firstverifythatthesampleis
sufficientlylargetousethenormaldistribution.
14. Supposethat29%ofallresidentsofacommunityfavorannexationbyanearbymunicipality.Findthe
probabilitythatinarandomsampleof50residentsatleast35%willfavorannexation.Firstverifythatthe
sampleissufficientlylargetousethenormaldistribution.
15. Supposethat2%ofallcellphoneconnectionsbyacertainprovideraredropped.Findtheprobabilitythatina
randomsampleof1,500callsatmost40willbedropped.Firstverifythatthesampleissufficientlylargeto
usethenormaldistribution.
16. Supposethatin20%ofalltrafficaccidentsinvolvinganinjury,driverdistractioninsomeform(forexample,
changingaradiostationortexting)isafactor.Findtheprobabilitythatinarandomsampleof275such
accidentsbetween15%and25%involvedriverdistractioninsomeform.Firstverifythatthesampleis
sufficientlylargetousethenormaldistribution.
17. Anairlineclaimsthat72%ofallitsflightstoacertainregionarriveontime.Inarandomsampleof30recent
arrivals,19wereontime.Youmayassumethatthenormaldistributionapplies.
a. Computethesampleproportion.
b. Assumingtheairline’sclaimistrue,findtheprobabilityofasampleofsize30producingasample
proportionsolowaswasobservedinthissample.
18. Ahumanesocietyreportsthat19%ofallpetdogswereadoptedfromananimalshelter.Assumingthetruth
ofthisassertion,findtheprobabilitythatinarandomsampleof80petdogs,between15%and20%were
adoptedfromashelter.Youmayassumethatthenormaldistributionapplies.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 299/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
299
19. Inonestudyitwasfoundthat86%ofallhomeshaveafunctionalsmokedetector.Supposethisproportionis
validforallhomes.Findtheprobabilitythatinarandomsampleof600homes,between80%and90%will
haveafunctionalsmokedetector.Youmayassumethatthenormaldistributionapplies.
20. Astateinsurancecommissionestimatesthat13%ofallmotoristsinitsstateareuninsured.Supposethis
proportionisvalid.Findtheprobabilitythatinarandomsampleof50motorists,atleast5willbeuninsured.
Youmayassumethatthenormaldistributionapplies.
21. Anoutsidefinancialauditorhasobservedthatabout4%ofalldocumentsheexaminescontainanerrorof
somesort.Assumingthisproportiontobeaccurate,findtheprobabilitythatarandomsampleof700
documentswillcontainatleast30withsomesortoferror.Youmayassumethatthenormaldistribution
applies.
22. Suppose7%ofallhouseholdshavenohometelephonebutdependcompletelyoncellphones.Findthe
probabilitythatinarandomsampleof450households,between25and35willhavenohometelephone.
Youmayassumethatthenormaldistributionapplies.
A D D I T I O N A L E X E R C I S E S
23. Somecountriesallowindividualpackagesofprepackagedgoodstoweighlessthanwhatisstatedonthe
package,subjecttocertainconditions,suchastheaverageofallpackagesbeingthestatedweightorgreater.
Supposethatonerequirementisthatatmost4%ofallpackagesmarked500gramscanweighlessthan490
grams.Assumingthataproductactuallymeetsthisrequirement,findtheprobabilitythatinarandomsample
of150suchpackagestheproportionweighinglessthan490gramsisatleast3%.Youmayassumethatthe
normaldistributionapplies.
24. Aneconomistwishestoinvestigatewhetherpeoplearekeepingcarslongernowthaninthepast.Heknows
thatfiveyearsago,38%ofallpassengervehiclesinoperationwereatleasttenyearsold.Hecommissionsa
studyinwhich325automobilesarerandomlysampled.Ofthem,132aretenyearsoldorolder.
a. Findthesampleproportion.
b. Findtheprobabilitythat,whenasampleofsize325isdrawnfromapopulationinwhichthetrue
proportionis0.38,thesampleproportionwillbeaslargeasthevalueyoucomputedinpart(a).
Youmayassumethatthenormaldistributionapplies.
c. Giveaninterpretationoftheresultinpart(b).Istherestrongevidencethatpeoplearekeeping
theircarslongerthanwasthecasefiveyearsago?
25. Astatepublichealthdepartmentwishestoinvestigatetheeffectivenessofacampaignagainstsmoking.
Historically22%ofalladultsinthestateregularlysmokedcigarsorcigarettes.Inasurveycommissionedbythe
publichealthdepartment,279of1,500randomlyselectedadultsstatedthattheysmokeregularly.
a. Findthesampleproportion.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 300/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
300
b. Findtheprobabilitythat,whenasampleofsize1,500isdrawnfromapopulationinwhichthe
trueproportionis0.22,thesampleproportionwillbenolargerthanthevalueyoucomputedin
part(a).Youmayassumethatthenormaldistributionapplies.
c. Giveaninterpretationoftheresultinpart(b).Howstrongistheevidencethatthecampaignto
reducesmokinghasbeeneffective?
26. Inanefforttoreducethepopulationofunwantedcatsanddogs,agroupofveterinarianssetupalow-cost
spay/neuterclinic.Attheinceptionoftheclinicasurveyofpetownersindicatedthat78%ofallpetdogsand
catsinthecommunitywerespayedorneutered.Afterthelow-costclinichadbeeninoperationforthree
years,thatfigurehadrisento86%.
a. Whatinformationismissingthatyouwouldneedtocomputetheprobabilitythatasample
drawnfromapopulationinwhichtheproportionis78%(correspondingtotheassumptionthat
thelow-costclinichadhadnoeffect)isashighas86%?
b. Knowingthatthesizeoftheoriginalsamplethreeyearsagowas150andthatthesizeofthe
recentsamplewas125,computetheprobabilitymentionedinpart(a).Youmayassumethatthe
normaldistributionapplies.
c. Giveaninterpretationoftheresultinpart(b).Howstrongistheevidencethatthepresenceof
thelow-costclinichasincreasedtheproportionofpetdogsandcatsthathavebeenspayedor
neutered?
27. Anordinarydieis“fair”or“balanced”ifeachfacehasanequalchanceoflandingontopwhenthedieis
rolled.Thustheproportionoftimesathreeisobservedinalargenumberoftossesisexpectedtobecloseto
1/6or0.16−.Supposeadieisrolled240timesandshowsthreeontop36times,forasampleproportionof
0.15.
a. Findtheprobabilitythatafairdiewouldproduceaproportionof0.15orless.Youmayassumethat
thenormaldistributionapplies.
b. Giveaninterpretationoftheresultinpart(b).Howstrongistheevidencethatthedieisnotfair?
c. Supposethesampleproportion0.15camefromrollingthedie2,400timesinsteadofonly240times.
Reworkpart(a)underthesecircumstances.
d. Giveaninterpretationoftheresultinpart(c).Howstrongistheevidencethatthedieisnotfair?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 301/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
301
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 302/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
302
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 303/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
303
Chapter7
EstimationIf we wish to estimate the mean of a population for which a census is impractical, say the average
height of all 18-year-old men in the country, a reasonable strategy is to take a sample, compute its
mean x−, and estimate the unknown number by the known number x−. For example, if the average
height of 100 randomly selected men aged 18 is 70.6 inches, then we would say that the average
height of all 18-year-old men is (at least approximately) 70.6 inches.
Estimating a population parameter by a single number like this is called point estimation; in the
case at hand the statistic x^ − is a point estimate of the parameter . The terminology arises because
a single number corresponds to a single point on the number line.
A problem with a point estimate is that it gives no indication of how reliable the estimate is. In
contrast, in this chapter we learn about interval estimation. In brief, in the case of estimating a
population mean we use a formula to compute from the data a number E , called
the margin of error of the estimate, and form the interval [ x^ −− E , x−+ E ]. We do this in such a way that
a certain proportion, say 95%, of all the intervals constructed from sample data by means of this
formula contain the unknown parameter . Such an interval is called
a 95% confidence interval f or .
Continuing with the example of the average height of 18-year-old men, suppose that the sample of
100 men mentioned above for which x^−=70.6 inches also had sample standard deviation s = 1.7
inches. It then turns out that E = 0.33 and we would state that we are 95% confident that the average
height of all 18-year-old men is in the interval formed by 70.6±0.33 inches, that is, the average is
between 70.27 and 70.93 inches. If the sample statistics had come from a smaller sample, say a
sample of 50 men, the lower reliability would show up in the 95% confidence interval being longer,
hence less precise in its estimate. In this example the 95% confidence interval for the same sample
statistics but with n = 50 is 70.6±0.47 inches, or from 70.13 to 71.07 inches.
7.1LargeSampleEstimationofaPopulationMean
L E A R N I N G O B J E C T I V E S
1. Tobecomefamiliarwiththeconceptofanintervalestimateofthepopulationmean.
2. Tounderstandhowtoapplyformulasforaconfidenceintervalforapopulationmean.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 304/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
304
Figure 7.2 "Computer Simulation of 40 95% Confidence Intervals for a Mean"shows the intervals
generated by a computer simulation of drawing 40 samples from a normally distributed population
and constructing the 95% confidence interval for each one. We expect that about (0.05)(40)=2 of the
intervals so constructed would fail to contain the population mean , and in this simulation two of
the intervals, shown in red, do.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 305/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
305
Figure 7.2 Computer Simulation of 40 95% Confidence Intervals for a Mean
It is standard practice to identify the level of confidence in terms of the area α in the two tails of the
distribution of X^−− when the middle part specified by the level of confidence is taken out. This is
shown in Figure 7.3, drawn for the general situation, and in Figure 7.4, drawn for 95% confidence.
Remember from Section 5.4.1 "Tails of the Standard Normal Distribution" in Chapter 5 "Continuous
Random Variables" that the z -value that cuts off a right tail of area c is denoted z c. Thus the number
1.960 in the example is z .025, which is z α/2 for α=1−0.95=0.05.
Figure 7.3
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 306/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
306
100(1−α)α/2.
Figure 7.4
α/2=0.025.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 307/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
307
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 308/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
308
E X A M P L E 2 UseFigure12.3"CriticalValuesof"tofindthenumber z α/2neededinconstructionofaconfidence
interval:
a. whenthelevelofconfidenceis90%;
b. whenthelevelofconfidenceis99%.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 309/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
309
Solution:
a. Inthenextsectionwewilllearnaboutacontinuousrandomvariablethathasaprobability
distributioncalledtheStudentt -distribution.Figure12.3"CriticalValuesof" givesthevaluet cthatcutsoffa
righttailofareacfordifferentvaluesofc.Thelastlineofthattable,theonewhoseheadingisthe
symbol∞forinfinityand [ z ],givesthecorrespondingz-valuezcthatcutsoffarighttailofthesameareac.In
particular,z0.05isthenumberinthatrowandinthecolumnwiththeheadingt 0.05.Wereadoffdirectly
that z 0.05=1.645.
b. InFigure12.3"CriticalValuesof" z0.005isthenumberinthelastrowandinthecolumnheadedt 0.005,namely
2.576.
Figure 12.3 "Critical Values of " can be used to find z c only for those values of cfor which there is a
column with the heading t c appearing in the table; otherwise we must use Figure 12.2 "Cumulative
Normal Probability" in reverse. But when it can be done it is both faster and more accurate to use the
last line of Figure 12.3 "Critical Values of " to find z c than it is to do so using Figure 12.2 "Cumulative
Normal Probability" in reverse.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 310/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
310
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 311/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
311
E X E R C I S E S B A S I C
1. Arandomsampleisdrawnfromapopulationofknownstandarddeviation11.3.Constructa90%confidence
intervalforthepopulationmeanbasedontheinformationgiven(notalloftheinformationgivenneedbe
used).
a. n=36, x−=105.2,s=11.2
b. n=100, x−=105.2,s=11.2
2. Arandomsampleisdrawnfromapopulationofknownstandarddeviation22.1.Constructa95%confidence
intervalforthepopulationmeanbasedontheinformationgiven(notalloftheinformationgivenneedbeused).
a. n=121, x−=82.4,s=21.9
b. n=81, x−=82.4,s=21.9
3. Arandomsampleisdrawnfromapopulationofunknownstandarddeviation.Constructa99%confidence
intervalforthepopulationmeanbasedontheinformationgiven.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 312/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
312
a. n=49, x−=17.1,s=2.1
b. n=169, x−=17.1,s=2.1
4. Arandomsampleisdrawnfromapopulationofunknownstandarddeviation.Constructa98%confidence
intervalforthepopulationmeanbasedontheinformationgiven.
a. n=225, x−=92.0,s=8.4
b. n=64, x−=92.0,s=8.4
5. Arandomsampleofsize144isdrawnfromapopulationwhosedistribution,mean,andstandard
deviationareallunknown.Thesummarystatisticsare x−=58.2ands=2.6.
a. Constructan80%confidenceintervalforthepopulationmean μ.
b. Constructa90%confidenceintervalforthepopulationmean μ.
c. Commentonwhyoneintervalislongerthantheother.
6. Arandomsampleofsize256isdrawnfromapopulationwhosedistribution,mean,andstandard
deviationareallunknown.Thesummarystatisticsare x−=1011ands=34.
a. Constructa90%confidenceintervalforthepopulationmean μ.
b. Constructa99%confidenceintervalforthepopulationmean μ.
c. Commentonwhyoneintervalislongerthantheother.
A P P L I C A T I O N S
7. Agovernmentagencywaschargedbythelegislaturewithestimatingthelengthoftimeittakescitizenstofill
outvariousforms.Twohundredrandomlyselectedadultsweretimedastheyfilledoutaparticularform.The
timesrequiredhadmean12.8minuteswithstandarddeviation1.7minutes.Constructa90%confidence
intervalforthemeantimetakenforalladultstofilloutthisform.
8. Fourhundredrandomlyselectedworkingadultsinacertainstate,includingthosewhoworkedathome,
wereaskedthedistancefromtheirhometotheirworkplace.Theaveragedistancewas8.84mileswith
standarddeviation2.70miles.Constructa99%confidenceintervalforthemeandistancefromhometowork
forallresidentsofthisstate.
9. Oneverypassengervehiclethatittestsanautomotivemagazinemeasures,attruespeed55mph,the
differencebetweenthetruespeedofthevehicleandthespeedindicatedbythespeedometer.For36
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 313/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
313
vehiclestestedthemeandifferencewas−1.2mphwithstandarddeviation0.2mph.Constructa90%
confidenceintervalforthemeandifferencebetweentruespeedandindicatedspeedforallvehicles.
10. Acorporationmonitorstimespentbyofficeworkersbrowsingthewebontheircomputersinsteadof
working.Inasampleofcomputerrecordsof50workers,theaverageamountoftimespentbrowsinginan
eight-hourworkdaywas27.8minuteswithstandarddeviation8.2minutes.Constructa99.5%confidence
intervalforthemeantimespentbyallofficeworkersinbrowsingthewebinaneight-hourday.
11. Asampleof250workersaged16andolderproducedanaveragelengthoftimewiththecurrentemployer
(“jobtenure”)of4.4yearswithstandarddeviation3.8years.Constructa99.9%confidenceintervalforthe
meanjobtenureofallworkersaged16orolder.
12. Theamountofaparticularbiochemicalsubstancerelatedtobonebreakdownwasmeasuredin30healthy
women.Thesamplemeanandstandarddeviationwere3.3nanogramspermilliliter(ng/mL)and1.4ng/mL.
Constructan80%confidenceintervalforthemeanlevelofthissubstanceinallhealthywomen.
13. Acorporationthatownsapartmentcomplexeswishestoestimatetheaveragelengthoftimeresidents
remaininthesameapartmentbeforemovingout.Asampleof150rentalcontractsgaveameanlengthof
occupancyof3.7yearswithstandarddeviation1.2years.Constructa95%confidenceintervalforthemean
lengthofoccupancyofapartmentsownedbythiscorporation.
14. Thedesignerofagarbagetruckthatliftsroll-outcontainersmustestimatethemeanweightthetruckwilllift
ateachcollectionpoint.Arandomsampleof325containersofgarbageoncurrentcollectionroutes
yielded x−=75.3lb,s=12.8lb.Constructa99.8%confidenceintervalforthemeanweightthetrucksmustlift
eachtime.
15. Inordertoestimatethemeanamountofdamagesustainedbyvehicleswhenadeerisstruck,aninsurance
companyexaminedtherecordsof50suchoccurrences,andobtainedasamplemeanof$2,785withsample
standarddeviation$221.Constructa95%confidenceintervalforthemeanamountofdamageinallsuch
accidents.
16. InordertoestimatethemeanFICOcreditscoreofitsmembers,acreditunionsamplesthescoresof95
members,andobtainsasamplemeanof738.2withsamplestandarddeviation64.2.Constructa99%
confidenceintervalforthemeanFICOscoreofallofitsmembers.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 314/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
314
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 315/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
315
L A R G E D A T A S E T E X E R C I S E S
23. LargeDataSet1recordstheSATscoresof1,000students.Regardingitasarandomsampleofallhighschool
students,useittoconstructa99%confidenceintervalforthemeanSATscoreofallstudents.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
24. LargeDataSet1recordstheGPAsof1,000collegestudents.Regardingitasarandomsampleofallcollege
students,useittoconstructa95%confidenceintervalforthemeanGPAofallstudents.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 316/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
316
http://www.flatworldknowledge.com/sites/all/files/data1.xls
25. LargeDataSet1liststheSATscoresof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Regardthedataasarisingfromacensusofallstudentsatahighschool,inwhichtheSATscore
ofeverystudentwasmeasured.Computethepopulationmean μ.
b. Regardthefirst36studentsasarandomsampleanduseittoconstructa99%confidenceforthe
mean μofall1,000SATscores.Doesitactuallycapturethemean μ?
26. LargeDataSet1liststheGPAsof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Regardthedataasarisingfromacensusofallfreshmanatasmallcollegeattheendoftheirfirst
academicyearofcollegestudy,inwhichtheGPAofeverysuchpersonwasmeasured.Computethe
populationmean μ.
b. Regardthefirst36studentsasarandomsampleanduseittoconstructa95%confidenceforthe
mean μofall1,000GPAs.Doesitactuallycapturethemean μ?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 317/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
317
7.2SmallSampleEstimationofaPopulationMeanL E A R N I N G O B J E C T I V E S
1. TobecomefamiliarwithStudent’st -distribution.
2. Tounderstandhowtoapplyadditionalformulasforaconfidenceintervalforapopulationmean.
The confidence interval formulas in the previous section are based on the Central Limit Theorem, the
statement that for large samples X^ −− is normally distributed with mean and standard
deviation σ /√n. When the population mean is estimated with a small sample (n < 30), the Central
Limit Theorem does not apply. In order to proceed we assume that the numerical population from
which the sample is taken has a normal distribution to begin with. If this condition is satisfied then
when the population standard deviation is known the old formula x^−± z α/2(σ /√n) can still be used to
construct a 100(1−α)% confidence interval for .
If the population standard deviation is unknown and the sample size n is small then when we
substitute the sample standard deviation s for the normal approximation is no longer valid. The
solution is to use a different distribution, called Student’s t-
distribution with n−1 degrees of freedom. Student’s t -distribution is very much like the standard
normal distribution in that it is centered at 0 and has the same qualitative bell shape, but it has
heavier tails than the standard normal distribution does, as indicated by Figure 7.5 "Student’s ", in
which the curve (in brown) that meets the dashed vertical line at the lowest point is the t -distribution
with two degrees of freedom, the next curve (in blue) is the t -distribution with five degrees of
freedom, and the thin curve (in red) is the standard normal distribution. As also indicated by the
figure, as the sample size n increases, Student’s t -distribution ever more closely resembles the
standard normal distribution. Although there is a different t -distribution for every value of n, once the
sample size is 30 or more it is typically acceptable to use the standard normal distribution instead, as
we will always do in this text.
Figure 7.5 Student’s t -Distribution
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 318/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
318
Just as the symbol z c stands for the value that cuts off a right tail of area c in the standard normal
distribution, so the symbol t c stands for the value that cuts off a right tail of area c in the standard
normal distribution. This gives us the following confidence interval formulas.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 319/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
319
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 320/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
320
Compare Note 7.9 "Example 4" in Section 7.1 "Large Sample Estimation of a Population
Mean" and Note 7.16 "Example 6". The summary statistics in the two samples are the same, but the
90% confidence interval for the average GPA of all students at the university in Note 7.9 "Example
4" in Section 7.1 "Large Sample Estimation of a Population Mean", (2.63,2.79), is shorter than the 90%
confidence interval (2.45,2.97), in Note 7.16 "Example 6". This is partly because in Note 7.9 "Example
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 321/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
321
4" the sample size is larger; there is more information pertaining to the true value of in the large
data set than in the small one.
K E Y T A K E A W A Y S
• Inselectingthecorrectformulaforconstructionofaconfidenceintervalforapopulationmeanasktwo
questions:isthepopulationstandarddeviationσ knownorunknown,andisthesamplelargeorsmall?
• Wecanconstructconfidenceintervalswithsmallsamplesonlyifthepopulationisnormal.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 322/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
322
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 323/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
323
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 324/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
324
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 325/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
325
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 326/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
326
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 327/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
327
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 328/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
328
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 329/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
329
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 330/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
330
7.3LargeSampleEstimationofaPopulationProportion
L E A R N I N G O B J E C T I V E
1. Tounderstandhowtoapplytheformulaforaconfidenceintervalforapopulationproportion.
Since from Section 6.3 "The Sample Proportion" in Chapter 6 "Sampling Distributions" we know the
mean, standard deviation, and sampling distribution of the sample proportion pˆ, the ideas of the
previous two sections can be applied to produce a confidence interval for a population proportion.
Here is the formula.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 331/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
331
K E Y T A K E A W A Y S
• Wehaveasingleformulaforaconfidenceintervalforapopulationproportion,whichisvalidwhenthe
sampleislarge.
• Theconditionthatasamplebelargeisnotthatitssizenbeatleast30,butthatthedensityfunctionfit
insidetheinterval [0,1].
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 332/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
332
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 333/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
333
a. Giveapointestimateoftheproportion pofallpeoplewhocouldreadwordsdisguisedinthis
way.
b. Showthatthesampleisnotsufficientlylargetoconstructaconfidenceintervalforthe
proportionofallpeoplewhocouldreadwordsdisguisedinthisway.
8. Inarandomsampleof900adults,42definedthemselvesasvegetarians.
a. Giveapointestimateoftheproportionofalladultswhowoulddefinethemselvesasvegetarians.
b. Verifythatthesampleissufficientlylargetouseittoconstructaconfidenceintervalforthat
proportion.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 334/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
334
c. Constructan80%confidenceintervalfortheproportionofalladultswhowoulddefine
themselvesasvegetarians.
9. Inarandomsampleof250employedpeople,61saidthattheybringworkhomewiththematleast
occasionally.
a. Giveapointestimateoftheproportionofallemployedpeoplewhobringworkhomewiththem
atleastoccasionally.
b. Constructa99%confidenceintervalforthatproportion.
10. Inarandomsampleof1,250householdmoves,822weremovestoalocationwithinthesamecountyasthe
originalresidence.
a. Giveapointestimateoftheproportionofallhouseholdmovesthataretoalocationwithinthe
samecountyastheoriginalresidence.
b. Constructa98%confidenceintervalforthatproportion.
11. Inarandomsampleof12,447hipreplacementorrevisionsurgeryproceduresnationwide,162patients
developedasurgicalsiteinfection.
a. Giveapointestimateoftheproportionofallpatientsundergoingahipsurgeryprocedurewho
developasurgicalsiteinfection.
b. Verifythatthesampleissufficientlylargetouseittoconstructaconfidenceintervalforthat
proportion.
c. Constructa95%confidenceintervalfortheproportionofallpatientsundergoingahipsurgery
procedurewhodevelopasurgicalsiteinfection.
12. Inacertainregionprepackagedproductslabeled500gmustcontainonaverageatleast500gramsofthe
product,andatleast90%ofallpackagesmustweighatleast490grams.Inarandomsampleof300packages,
288weighedatleast490grams.
a. Giveapointestimateoftheproportionofallpackagesthatweighatleast490grams.
b. Verifythatthesampleissufficientlylargetouseittoconstructaconfidenceintervalforthat
proportion.
c. Constructa99.8%confidenceintervalfortheproportionofallpackagesthatweighatleast490
grams.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 335/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
335
15. Inordertoestimatetheproportionofenteringstudentswhograduatewithinsixyears,theadministrationata
stateuniversityexaminedtherecordsof600randomlyselectedstudentswhoenteredtheuniversitysixyears
ago,andfoundthat312hadgraduated.
a. Giveapointestimateofthesix-yeargraduationrate,theproportionofenteringstudentswho
graduatewithinsixyears.
b. Assumingthatthesampleissufficientlylarge,constructa98%confidenceintervalforthesix-year
graduationrate.
16. Inarandomsampleof2,300mortgagestakenoutinacertainregionlastyear,187wereadjustable-rate
mortgages.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 336/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
336
a. Giveapointestimateoftheproportionofallmortgagestakenoutinthisregionlastyearthatwere
adjustable-ratemortgages.
b. Assumingthatthesampleissufficientlylarge,constructa99.9%confidenceintervalforthe
proportionofallmortgagestakenoutinthisregionlastyearthatwereadjustable-ratemortgages.
17. Inaresearchstudyincattlebreeding,159of273cowsinseveralherdsthatwereinestrusweredetectedby
meansofanintensiveonceaday,one-hourobservationoftheherdsinearlymorning.
a. Giveapointestimateoftheproportionofallcattleinestruswhoaredetectedbythismethod.
b. Assumingthatthesampleissufficientlylarge,constructa90%confidenceintervalfortheproportion
ofallcattleinestruswhoaredetectedbythismethod.
18. Asurveyof21,250householdsconcerningtelephoneservicegavetheresultsshowninthetable.
Landline No Landline
Cell phone 12,474 5,844
No cell phone 2,529 403
a. Giveapointestimatefortheproportionofallhouseholdsinwhichthereisacellphonebutno
landline.
b. Assumingthesampleissufficientlylarge,constructa99.9%confidenceintervalfortheproportionof
allhouseholdsinwhichthereisacellphonebutnolandline.
c. Giveapointestimatefortheproportionofallhouseholdsinwhichthereisnotelephoneserviceof
eitherkind.
d. Assumingthesampleissufficientlylarge,constructa99.9%confidenceintervalfortheproportionof
allallhouseholdsinwhichthereisnotelephoneserviceofeitherkind.
A D D I T I O N A L E X E R C I S E S
19. Inarandomsampleof900adults,42definedthemselvesasvegetarians.Ofthese42,29werewomen.
a. Giveapointestimateoftheproportionofallself-describedvegetarianswhoarewomen.
b. Verifythatthesampleissufficientlylargetouseittoconstructaconfidenceintervalforthat
proportion.
c. Constructa90%confidenceintervalfortheproportionofallallself-describedvegetarianswho
arewomen.20. Arandomsampleof185collegesoccerplayerswhohadsufferedinjuriesthatresultedinlossofplayingtime
wasmadewiththeresultsshowninthetable.Injuriesareclassifiedaccordingtoseverityoftheinjuryand
theconditionunderwhichitwassustained.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 337/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
337
Minor Moderate Serious
Practice 48 20 6
Game 62 32 17
a. Giveapointestimatefortheproportion pofallinjuriestocollegesoccerplayersthatare
sustainedinpractice.
b. Constructa95%confidenceintervalfortheproportion pofallinjuriestocollegesoccerplayers
thataresustainedinpractice.
c. Giveapointestimatefortheproportion pofallinjuriestocollegesoccerplayersthatareeither
moderateorserious.
21. Thebodymassindex(BMI)wasmeasuredin1,200randomlyselectedadults,withtheresultsshownin
thetable.
BMI
Under 18.5 18.5–25 Over 25
Men 36 165 315
Women 75 274 335
a. GiveapointestimatefortheproportionofallmenwhoseBMIisover25.
b. Assumingthesampleissufficientlylarge,constructa99%confidenceintervalfortheproportionofallmenwhose
BMIisover25.
c. Giveapointestimatefortheproportionofalladults,regardlessofgender,whoseBMIisover25.
d. Assumingthesampleissufficientlylarge,constructa99%confidenceintervalfortheproportionofalladults,
regardlessofgender,whoseBMIisover25.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 338/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
338
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 339/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
339
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 340/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
340
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 341/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
341
7.4SampleSizeConsiderations
L E A R N I N G O B J E C T I V E
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 342/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
342
1. Tolearnhowtoapplyformulasforestimatingthesizesamplethatwillbeneededinordertoconstructa
confidenceintervalforapopulationmeanorproportionthatmeetsgivencriteria.
Sampling is typically done with a set of clear objectives in mind. For example, an economist might
wish to estimate the mean yearly income of workers in a particular industry at 90% confidence and
to within $500. Since sampling costs time, effort, and money, it would be useful to be able to
estimate the smallest size sample that is likely to meet these criteria.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 343/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
343
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 344/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
344
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 345/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
345
There is a dilemma here: the formula for estimating how large a sample to take contains the
number pˆ, which we know only after we have taken the sample. There are two ways out of this
dilemma. Typically the researcher will have some idea as to the value of the population proportion p,
hence of what the sample proportion p
ˆ is likely to be. For example, if last month 37% of all votersthought that state taxes are too high, then it is likely that the proportion with that opinion this month
will not be dramatically different, and we would use the value 0.37 for pˆ in the formula.
The second approach to resolving the dilemma is simply to replace pˆ in the formula by 0.5. This is
because if pˆ is large then 1− pˆ is small, and vice versa, which limits their product to a maximum value
of 0.25, which occurs when pˆ=0.5. This is called the most conservative estimate, since it gives the
largest possible estimate of n.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 346/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
346
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 347/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
347
K E Y T A K E A W A Y S • Ifthepopulationstandarddeviationσ isknownorcanbeestimated,thentheminimumsamplesize
neededtoobtainaconfidenceintervalforthepopulationmeanwithagivenmaximumerrorofthe
estimateandagivenlevelofconfidencecanbeestimated.
• Theminimumsamplesizeneededtoobtainaconfidenceintervalforapopulationproportionwithagiven
maximumerroroftheestimateandagivenlevelofconfidencecanalwaysbeestimated.Ifthereisprior
knowledgeofthepopulationproportion pthentheestimatecanbesharpened.
E X E R C I S E S
B A S I C
1. Estimatetheminimumsamplesizeneededtoformaconfidenceintervalforthemeanofapopulationhaving
thestandarddeviationshown,meetingthecriteriagiven.
a. σ =30,95%confidence,E =10
b. σ =30,99%confidence,E =10
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 348/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
348
c. σ =30,95%confidence,E =5
2. Estimatetheminimumsamplesizeneededtoformaconfidenceintervalforthemeanofapopulationhaving
thestandarddeviationshown,meetingthecriteriagiven.
a. σ =4,95%confidence,E =1
b. σ =4,99%confidence,E =1
c. σ =4,95%confidence,E =0.5
3. Estimatetheminimumsamplesizeneededtoformaconfidenceintervalfortheproportionofapopulation
thathasaparticularcharacteristic,meetingthecriteriagiven.
a. p≈0.37,80%confidence,E =0.05
b. p≈0.37,90%confidence,E =0.05
c. p≈0.37,80%confidence,E =0.01
4. Estimatetheminimumsamplesizeneededtoformaconfidenceintervalfortheproportionofa
populationthathasaparticularcharacteristic,meetingthecriteriagiven.
a. p≈0.81,95%confidence,E =0.02
b. p≈0.81,99%confidence,E =0.02
c. p≈0.81,95%confidence,E =0.01
5. Estimatetheminimumsamplesizeneededtoformaconfidenceintervalfortheproportionofa
populationthathasaparticularcharacteristic,meetingthecriteriagiven.
a. 80%confidence,E =0.05
b. 90%confidence,E =0.05
c. 80%confidence,E =0.01
6. Estimatetheminimumsamplesizeneededtoformaconfidenceintervalfortheproportionofa
populationthathasaparticularcharacteristic,meetingthecriteriagiven.
a. 95%confidence,E =0.02
b. 99%confidence,E =0.02
c. 95%confidence,E =0.01
A P P L I C A T I O N S
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 349/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
349
7. Asoftwareengineerwishestoestimate,towithin5seconds,themeantimethatanewapplicationtakesto
startup,with95%confidence.Estimatetheminimumsizesamplerequiredifthestandarddeviationofstart
uptimesforsimilarsoftwareis12seconds.
8. Arealestateagentwishestoestimate,towithin$2.50,themeanretailcostpersquarefootofnewlybuilt
homes,with80%confidence.Heestimatesthestandarddeviationofsuchcostsat$5.00.Estimatethe
minimumsizesamplerequired.
9. Aneconomistwishestoestimate,towithin2minutes,themeantimethatemployedpersonsspend
commutingeachday,with95%confidence.Ontheassumptionthatthestandarddeviationofcommuting
timesis8minutes,estimatetheminimumsizesamplerequired.
10. Amotorclubwishestoestimate,towithin1cent,themeanpriceof1gallonofregulargasolineinacertain
region,with98%confidence.Historicallythevariabilityofpricesismeasuredbyσ =$0.03.Estimatethe
minimumsizesamplerequired.
11. Abankwishestoestimate,towithin$25,themeanaveragemonthlybalanceinitscheckingaccounts,with
99.8%confidence.Assumingσ =$250,estimatetheminimumsizesamplerequired.
12. Aretailerwishestoestimate,towithin15seconds,themeandurationoftelephoneorderstakenatitscall
center,with99.5%confidence.Inthepastthestandarddeviationofcalllengthhasbeenabout1.25minutes.
Estimatetheminimumsizesamplerequired.(Becarefultoexpressalltheinformationinthesameunits.)
13. Theadministrationatacollegewishestoestimate,towithintwopercentagepoints,theproportionofallits
enteringfreshmenwhograduatewithinfouryears,with90%confidence.Estimatetheminimumsizesample
required.
14. Achainofautomotiverepairstoreswishestoestimate,towithinfivepercentagepoints,theproportionofall
passengervehiclesinoperationthatareatleastfiveyearsold,with98%confidence.Estimatetheminimum
sizesamplerequired.
15. Aninternetserviceproviderwishestoestimate,towithinonepercentagepoint,thecurrentproportionofall
emailthatisspam,with99.9%confidence.Lastyeartheproportionthatwasspamwas71%.Estimatethe
minimumsizesamplerequired.
16. Anagronomistwishestoestimate,towithinonepercentagepoint,theproportionofanewvarietyofseed
thatwillgerminatewhenplanted,with95%confidence.Atypicalgerminationrateis97%.Estimatethe
minimumsizesamplerequired.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 350/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
350
17. Acharitableorganizationwishestoestimate,towithinhalfapercentagepoint,theproportionofall
telephonesolicitationstoitsdonorsthatresultinagift,with90%confidence.Estimatetheminimumsample
sizerequired,usingtheinformationthatinthepasttheresponseratehasbeenabout30%.
18. Agovernmentagencywishestoestimatetheproportionofdriversaged16–24whohavebeeninvolvedina
trafficaccidentinthelastyear.Itwishestomaketheestimatetowithinonepercentagepointandat90%
confidence.Findtheminimumsamplesizerequired,usingtheinformationthatseveralyearsagothe
proportionwas0.12.
A D D I T I O N A L E X E R C I S E S
19. Aneconomistwishestoestimate,towithinsixmonths,themeantimebetweensalesofexistinghomes,with
95%confidence.Estimatetheminimumsizesamplerequired.Inhisexperiencevirtuallyallhousesarere-sold
within40months,sousingtheEmpiricalRulehewillestimateσ byone-sixththerange,or40/6=6.7.
20. Awildlifemanagerwishestoestimatethemeanlengthoffishinalargelake,towithinoneinch,with80%
confidence.Estimatetheminimumsizesamplerequired.Inhisexperiencevirtuallynofishcaughtinthelake
isover23incheslong,sousingtheEmpiricalRulehewillestimateσ byone-sixththerange,or23/6=3.8.
21. Youwishtoestimatethecurrentmeanbirthweightofallnewbornsinacertainregion,towithin1ounce
(1/16pound)andwith95%confidence.Asamplewillcost$400plus$1.50foreverynewbornweighed.You
believethestandarddeviationsofweighttobenomorethan1.25pounds.Youhave$2,500tospendonthe
study.
a. Canyouaffordthesamplerequired?
b. Ifnot,whatareyouroptions?22. Youwishtoestimateapopulationproportiontowithinthreepercentagepoints,at95%confidence.Asample
willcost$500plus50centsforeverysampleelementmeasured.Youhave$1,000tospendonthestudy.
a. Canyouaffordthesamplerequired?
b. Ifnot,whatareyouroptions?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 351/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
351
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 352/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
352
Chapter8
TestingHypotheses
A manufacturer of emergency equipment asserts that a respirator that it makes delivers pure air for
75 minutes on average. A government regulatory agency is charged with testing such claims, in this
case to verify that the average time is not less than 75 minutes. To do so it would select a random
sample of respirators, compute the mean time that they deliver pure air, and compare that mean to
the asserted time 75 minutes.
In the sampling that we have studied so far the goal has been to estimate a population parameter.
But the sampling done by the government agency has a somewhat different objective, not so much
to estimate the population mean as totest an assertion—or a hypothesis—about it, namely, whether
it is as large as 75 or not. The agency is not necessarily interested in the actual value of , just
whether it is as claimed. Their sampling is done to perform a test of hypotheses, the subject of this
chapter.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 353/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
353
8.1TheElementsofHypothesisTesting
L E A R N I N G O B J E C T I V E S
1. Tounderstandthelogicalframeworkoftestsofhypotheses.
2. Tolearnbasicterminologyconnectedwithhypothesistesting.
3. Tolearnfundamentalfactsabouthypothesistesting.
TypesofHypotheses
A hypothesis about the value of a population parameter is an assertion about its value. As in the
introductory example we will be concerned with testing the truth of two competing hypotheses, only one
of which can be true.
DefinitionThe null hypothesis, denoted H 0, is the statement about the population parameter that is assumed to
be true unless there is convincing evidence to the contrary.
The alternative hypothesis, denoted H a, is a statement about the population parameter that is
contradictory to the null hypothesis, and is accepted as true only if there is convincing evidence in favor
of it.
DefinitionHypothesis testing is a statistical procedure in which a choice is made between a null hypothesis and
an alternative hypothesis based on information in a sample.
The end result of a hypotheses testing procedure is a choice of one of the following two possible
conclusions:
1. Reject H 0 (and therefore accept H a), or
2. Fail to reject H 0 (and therefore fail to accept H a).
The null hypothesis typically represents the status quo, or what has historically been true. In the
example of the respirators, we would believe the claim of the manufacturer unless there is reason not
to do so, so the null hypotheses is H 0: µ=75. The alternative hypothesis in the example is the
contradictory statement H a: µ<75. The null hypothesis will always be an assertion containing an equals
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 354/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
354
sign, but depending on the situation the alternative hypothesis can have any one of three forms: with
the symbol “<,” as in the example just discussed, with the symbol “>,” or with the symbol “≠” The
following two examples illustrate the latter two cases.
E X A M P L E 1
Apublisherofcollegetextbooksclaimsthattheaveragepriceofallhardboundcollegetextbooksis
$127.50.Astudentgroupbelievesthattheactualmeanishigherandwishestotesttheirbelief.State
therelevantnullandalternativehypotheses.
Solution:
Thedefaultoptionistoacceptthepublisher’sclaimunlessthereiscompellingevidencetothe
contrary.Thusthenullhypothesisis H 0: µ=127.50.Sincethestudentgroupthinksthattheaverage
textbookpriceisgreater thanthepublisher’sfigure,thealternativehypothesisinthissituation
is H a: µ>127.50.
E X A M P L E 2
Therecipeforabakeryitemisdesignedtoresultinaproductthatcontains8gramsoffatperserving.
Thequalitycontroldepartmentsamplestheproductperiodicallytoinsurethattheproduction
processisworkingasdesigned.Statetherelevantnullandalternativehypotheses.
Solution:
Thedefaultoptionistoassumethattheproductcontainstheamountoffatitwasformulatedto
containunlessthereiscompellingevidencetothecontrary.Thusthenullhypothesisis H 0: µ=8.0.Since
tocontaineithermorefatthandesiredortocontainlessfatthandesiredarebothanindicationofa
faultyproductionprocess,thealternativehypothesisinthissituationisthatthemeanis different
from8.0,so H a: µ≠8.0.In Note 8.8 "Example 1", the textbook example, it might seem more natural that the publisher’s
claim be that the average price is at most $127.50, not exactly $127.50. If the claim were made this
way, then the null hypothesis would be H 0: µ≤127.50, and the value $127.50 given in the example would
be the one that is least favorable to the publisher’s claim, the null hypothesis. It is always true that if
the null hypothesis is retained for its least favorable value, then it is retained for every other value.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 355/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 356/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
356
Figure 8.1 The Density Curve for X −− if H 0 Is True
Think of the respirator example, for which the null hypothesis is H 0: µ=75, the claim that the average
time air is delivered for all respirators is 75 minutes. If the sample mean is 75 or greater then we
certainly would not reject H 0 (since there is no issue with an emergency respirator delivering air even
longer than claimed).
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 357/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
357
If the sample mean is slightly less than 75 then we would logically attribute the difference to
sampling error and also not reject H 0 either.
Values of the sample mean that are smaller and smaller are less and less likely to come from a
population for which the population mean is 75. Thus if the sample mean is far less than 75, say around 60 minutes or less, then we would certainly reject H 0, because we know that it is highly
unlikely that the average of a sample would be so low if the population mean were 75. This is the rare
event criterionfor rejection: what we actually observed ( X^−−<60) would be so rare an event if = 75
were true that we regard it as much more likely that the alternative hypothesis < 75 holds.
In summary, to decide between H 0 and H a in this example we would select a “rejection region” of
values sufficiently far to the left of 75, based on the rare event criterion, and reject H 0 if the sample
mean X −− lies in the rejection region, but not reject H 0 if it does not.
TheRejectionRegion
Each different form of the alternative hypothesis H a has its own kind of rejection region:
1. if (as in the respirator example) H a has the form H a: µ< µ0, we reject H 0 if x−is far to the left of µ0, that is, to
the left of some number C , so the rejection region has the form of an interval ( ∞,C ];
2. if (as in the textbook example) H a has the form H a: µ> µ0, we reject H 0 if x−is far to the right of µ0, that is, to
the right of some number C , so the rejection region has the form of an interval [C ,∞);
3. if (as in the baked good example) H a has the form H a: µ≠ µ0, we reject H 0 if x− is far away from µ0 in either
direction, that is, either to the left of some number C or to the right of some other number C , so the
rejection region has the form of the union of two intervals ( ∞,C ]∪[C ,∞).
The key issue in our line of reasoning is the question of how to determine the number C or
numbers C and C , called the critical value or critical values of the statistic, that determine the
rejection region.
The key issue in our line of reasoning is the question of how to determine the number C or
numbers C and C , called the critical value or critical values of the statistic, that determine the
rejection region.
Definition
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 358/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
358
The critical value or critical values of a test of hypotheses are the number or numbers that determine
the rejection region.
Suppose the rejection region is a single interval, so we need to select a single number C . Here is the
procedure for doing so. We select a small probability, denoted α, say 1%, which we take as ourdefinition of “rare event:” an event is “rare” if its probability of occurrence is less than α. (In all the
examples and problems in this text the value of α will be given already.) The probability
that X^−− takes a value in an interval is the area under its density curve and above that interval, so as
shown in Figure 8.2 (drawn under the assumption that H 0 is true, so that the curve centers at µ0) the
critical value C is the value of X^−− that cuts off a tail area α in the probability density curve
of X^−−. When the rejection region is in two pieces, that is, composed of two intervals, the total area
above both of them must be α, so the area above each one is α/2, as also shown in Figure 8.2.
Figure 8.2
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 359/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
359
Figure8.3RejectionRegionfortheChoiceα=0.10
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 360/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
360
Thedecisionprocedureis:takeasampleofsize5andcomputethesamplemean x−.If x−iseither7.89
gramsorlessor8.11gramsormorethenrejectthehypothesisthattheaverageamountoffat
inall servingsoftheproductis8.0gramsinfavorofthealternativethatitisdifferentfrom8.0grams.
Otherwisedonotrejectthehypothesisthattheaverageamountis8.0grams.
Thereasoningisthatifthetrueaverageamountoffatperservingwere8.0gramsthentherewould
belessthana10%chancethatasampleofsize5wouldproduceameanofeither7.89gramsorless
or8.11gramsormore.Henceifthathappeneditwouldbemorelikelythatthevalue8.0isincorrect
(alwaysassumingthatthepopulationstandarddeviationis0.15gram).
Because the rejection regions are computed based on areas in tails of distributions, as shown
in Figure 8.2, hypothesis tests are classified according to the form of the alternative hypothesis in the
following way.
Definition
If H a has the form µ≠ µ0 the test is called a two-tailed test.
If H a has the form µ< µ0 the test is called a left-tailed test.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 361/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
361
If H a has the form µ> µ0 the test is called a right-tailed test.
Each of the last two forms is also called a one-tailed test.
TwoTypesofErrorsThe format of the testing procedure in general terms is to take a sample and use the information it
contains to come to a decision about the two hypotheses. As stated before our decision will always be
either
1. reject the null hypothesis H 0 in favor of the alternative H a presented, or
2. do not reject the null hypothesis H 0 in favor of the alternative H a presented.
There are four possible outcomes of hypothesis testing procedure, as shown in the following table:
True State of Nature
H 0 is true H 0 is false
Our Decision
Do not reject H 0 Correct decision Type II error
Reject H 0 Type I error Correct decision
As the table shows, there are two ways to be right and two ways to be wrong. Typically to
reject H 0 when it is actually true is a more serious error than to fail to reject it when it is false, so theformer error is labeled “Type I” and the latter error “Type II.”
Definition
In a test of hypotheses, a Type I error is the decision to reject H 0 when it is in fact true. A Type II error is
the decision not to reject H 0 when it is in fact not true.
Unless we perform a census we do not have certain knowledge, so we do not know whether our
decision matches the true state of nature or if we have made an error. We reject H 0 if what we observe
would be a “rare” event if H 0 were true. But rare events are not impossible: they occur with
probability α. Thus when H 0 is true, a rare event will be observed in the proportion α of repeated
similar tests, and H 0 will be erroneously rejected in those tests. Thus α is the probability that in
following the testing procedure to decide between H 0 and H a we will make a Type I error.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 362/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
362
Definition
The number α that is used to determine the rejection region is called the level of significance of the test. It
is the probability that the test procedure will result in a Type I error.
The probability of making a Type II error is too complicated to discuss in a beginning text, so we will say
no more about it than this: for a fixed sample size, choosing α smaller in order to reduce the chance of
making a Type I error has the effect of increasing the chance of making a Type II error. The only way to
simultaneously reduce the chances of making either kind of error is to increase the sample size.
StandardizingtheTestStatistic
Hypotheses testing will be considered in a number of contexts, and great unification as well assimplification results when the relevant sample statistic is standardized by subtracting its mean from it
and then dividing by its standard deviation. The resulting statistic is called a standardized test statistic. In
every situation treated in this and the following two chapters the standardized test statistic will have
either the standard normal distribution or Student’s t -distribution.
Definition
A standardized test statistic for a hypothesis test is the statistic that is formed by subtracting from
the statistic of interest its mean and dividing by its standard deviation.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 363/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
363
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 364/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
364
Every instance of hypothesis testing discussed in this and the following two chapters will have a
rejection region like one of the six forms tabulated in the tables above.
No matter what the context a test of hypotheses can always be performed by applying the followingsystematic procedure, which will be illustrated in the examples in the succeeding sections.
SystematicHypothesisTestingProcedure:CriticalValueApproach
1. Identify the null and alternative hypotheses.
2. Identify the relevant test statistic and its distribution.
3. Compute from the data the value of the test statistic.
4. Construct the rejection region.
5. Compare the value computed in Step 3 to the rejection region constructed in Step 4 and make a decision.
Formulate the decision in the context of the problem, if applicable.
The procedure that we have outlined in this section is called the “Critical Value Approach” to
hypothesis testing to distinguish it from an alternative but equivalent approach that will be
introduced at the end of Section 8.3 "The Observed Significance of a Test".
K E Y T A K E A W A Y S
• Atestofhypothesesisastatisticalprocessfordecidingbetweentwocompetingassertionsabouta
populationparameter.
• Thetestingprocedureisformalizedinafive-stepprocedure.
E X E R C I S E S
1. Statethenullandalternativehypothesesforeachofthefollowingsituations.(Thatis,identifythecorrect
number µ0andwrite H 0: µ= µ0andtheappropriateanalogousexpressionforHa.)
a. TheaverageJulytemperatureinaregionhistoricallyhasbeen74.5°F.Perhapsitishighernow.
b. Theaverageweightofafemaleairlinepassengerwithluggagewas145poundstenyearsago.
TheFAAbelievesittobehighernow.
c. Theaveragestipendfordoctoralstudentsinaparticulardisciplineatastateuniversityis
$14,756.Thedepartmentchairmanbelievesthatthenationalaverageishigher.
d. Theaverageroomrateinhotelsinacertainregionis$82.53.Atravelagentbelievesthatthe
averageinaparticularresortareaisdifferent.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 365/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
365
e. Theaveragefarmsizeinapredominatelyruralstatewas69.4acres.Thesecretaryofagriculture
ofthatstateassertsthatitislesstoday.
2. Statethenullandalternativehypothesesforeachofthefollowingsituations.(Thatis,identifythecorrect
number µ0andwrite H 0: µ= µ0andtheappropriateanalogousexpressionforHa.)
a. TheaveragetimeworkersspentcommutingtoworkinVeronafiveyearsagowas38.2minutes.
TheVeronaChamberofCommerceassertsthattheaverageislessnow.
b. Themeansalaryforallmeninacertainprofessionis$58,291.Aspecialinterestgroupthinksthat
themeansalaryforwomeninthesameprofessionisdifferent.
c. Theacceptedfigureforthecaffeinecontentofan8-ouncecupofcoffeeis133mg.Adietitian
believesthattheaverageforcoffeeservedinalocalrestaurantsishigher.
d. Theaverageyieldperacreforalltypesofcorninarecentyearwas161.9bushels.Aneconomist
believesthattheaverageyieldperacreisdifferentthisyear.
e. Anindustryassociationassertsthattheaverageageofallself-describedflyfishermenis42.8
years.Asociologistsuspectsthatitishigher.
3. Describethetwotypesoferrorsthatcanbemadeinatestofhypotheses.
4. Underwhatcircumstanceisatestofhypothesescertaintoyieldacorrectdecision?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 366/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
366
8.2LargeSampleTestsforaPopulationMean
L E A R N I N G O B J E C T I V E S
1. Tolearnhowtoapplythefive-steptestprocedureforatestofhypothesesconcerningapopulationmean
whenthesamplesizeislarge.
2. Tolearnhowtointerprettheresultofatestofhypothesesinthecontextoftheoriginalnarrated
situation.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 367/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
367
E X A M P L E 4
Itishopedthatanewlydevelopedpainrelieverwillmorequicklyproduceperceptiblereductionin
paintopatientsafterminorsurgeriesthanastandardpainreliever.Thestandardpainrelieveris
knowntobringreliefinanaverageof3.5minuteswithstandarddeviation2.1minutes.Totest
whetherthenewpainrelieverworksmorequicklythanthestandardone,50patientswithminor
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 368/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
368
surgeriesweregiventhenewpainrelieverandtheirtimestoreliefwererecorded.Theexperiment
yieldedsamplemean x^ −=3.1minutesandsamplestandarddeviation s=1.5minutes.Istheresufficient
evidenceinthesampletoindicate,atthe5%levelofsignificance,thatthenewlydevelopedpain
relieverdoesdeliverperceptiblereliefmorequickly?
Solution:
Weperformthetestofhypothesesusingthefive-stepproceduregivenattheendof Section8.1"The
ElementsofHypothesisTesting".
• Step1.Thenaturalassumptionisthatthenewdrugisnobetterthantheoldone,butmustbe
provedtobebetter.Thusif μdenotestheaveragetimeuntilallpatientswhoaregiventhenew
drugexperiencepainrelief,thehypothesistestis
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 369/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
369
perceptiblerelieffrompainusingthenewpainrelieverissmallerthantheaveragetimeforthe
standardpainreliever.
Figure8.5RejectionRegionandTestStatisticfor Note8.27"Example4"
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 370/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
370
E X A M P L E 5
Acosmeticscompanyfillsitsbest-selling8-ouncejarsoffacialcreambyanautomaticdispensing
machine.Themachineissettodispenseameanof8.1ouncesperjar.Uncontrollablefactorsinthe
processcanshiftthemeanawayfrom8.1andcauseeitherunderfilloroverfill,bothofwhichare
undesirable.Insuchacasethedispensingmachineisstoppedandrecalibrated.Regardlessofthe
meanamountdispensed,thestandarddeviationoftheamountdispensedalwayshasvalue0.22
ounce.Aqualitycontrolengineerroutinelyselects30jarsfromtheassemblylinetocheckthe
amountsfilled.Ononeoccasion,thesamplemeanis x−=8.2ouncesandthesamplestandarddeviation
iss=0.25ounce.Determineifthereissufficientevidenceinthesampletoindicate,atthe1%levelof
significance,thatthemachineshouldberecalibrated.
Solution:
• Step1.Thenaturalassumptionisthatthemachineisworkingproperly.Thusif μdenotesthe
meanamountoffacialcreambeingdispensed,thehypothesistestis
H 0: µ = 8.1
vs. H a: µ=≠8.1 @ α=0.01
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 371/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
371
Figure8.6RejectionRegionandTestStatisticfor Note8.28"Example5"
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 372/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
372
K E Y T A K E A W A Y S
• Therearetwoformulasfortheteststatisticintestinghypothesesaboutapopulationmeanwithlarge
samples.Bothteststatisticsfollowthestandardnormaldistribution.
• Thepopulationstandarddeviationisusedifitisknown,otherwisethesamplestandarddeviationisused.
• Thesamefive-stepprocedureisusedwitheitherteststatistic.
E X E R C I S E S
B A S I C
1. Findtherejectionregion(forthestandardizedteststatistic)foreachhypothesistest.
a. H 0: µ=27vs. H a: µ<27@α=0.05.
b. H 0: µ=52vs. H a: µ≠52@α=0.05.
c. H 0: µ=−105vs. H a: µ>−105@α=0.10.
d. H 0: µ=78.8vs. H a: µ≠78.8@α=0.10.
2. Findtherejectionregion(forthestandardizedteststatistic)foreachhypothesistest.
a. H 0: µ=17vs. H a: µ<17@α=0.01.
b. H 0: µ=880vs. H a: µ≠880@α=0.01.
c. H 0: µ=−12vs. H a: µ>−12@α=0.05.
d. H 0: µ=21.1vs. H a: µ≠21.1@α=0.05.
3. Findtherejectionregion(forthestandardizedteststatistic)foreachhypothesistest.Identifythetestas
left-tailed,right-tailed,ortwo-tailed.
a. H 0: µ=141vs. H a: µ<141@α=0.20.
b. H 0: µ=−54vs. H a: µ<−54@α=0.05.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 373/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
373
c. H 0: µ=98.6vs. H a: µ≠98.6@α=0.05.
d. H 0: µ=3.8vs. H a: µ>3.8@α=0.001.
4. Findtherejectionregion(forthestandardizedteststatistic)foreachhypothesistest.Identifythetestas
left-tailed,right-tailed,ortwo-tailed.
a. H 0: µ=−62vs. H a: µ≠−62@α=0.005.
b. H 0: µ=73vs. H a: µ>73@α=0.001.
c. H 0: µ=1124vs. H a: µ<1124@α=0.001.
d. H 0: µ=0.12vs. H a: µ≠0.12@α=0.001.
5. Computethevalueoftheteststatisticfortheindicatedtest,basedontheinformationgiven.
a. Testing H 0: µ=72.2vs. H a: µ>72.2,σ unknown,n=55, x−=75.1,s=9.25
b. Testing H 0: µ=58vs. H a: µ>58,σ =1.22,n=40, x−=58.5,s=1.29
c. Testing H 0: µ=−19.5vs. H a: µ<−19.5,σ unknown,n=30, x−=−23.2,s=9.55
d. Testing H 0: µ=805vs. H a: µ≠805,σ =37.5,n=75, x−=818,s=36.2
6. Computethevalueoftheteststatisticfortheindicatedtest,basedontheinformationgiven.
a. Testing H 0: µ=342vs. H a: µ<342,σ =11.2,n=40, x−=339,s=10.3
b. Testing H 0: µ=105vs. H a: µ>105,σ =5.3,n=80, x−=107,s=5.1
c. Testing H 0: µ=−13.5vs. H a: µ≠−13.5,σ unknown,n=32, x−=−13.8,s=1.5
d. Testing H 0: µ=28vs. H a: µ≠28,σ unknown,n=68, x−=27.8,s=1.3
7. Performtheindicatedtestofhypotheses,basedontheinformationgiven.
a. Test H 0: µ=212vs. H a: µ<212@α=0.10,σ unknown,n=36, x−=211.2,s=2.2
b. Test H 0: µ=−18vs. H a: µ>−18@α=0.05,σ =3.3,n=44, x−=−17.2,s=3.1
c. Test H 0: µ=24vs. H a: µ≠24@α=0.02,σ unknown,n=50, x−=22.8,s=1.9
8. Performtheindicatedtestofhypotheses,basedontheinformationgiven.
a. Test H 0: µ=105vs. H a: µ>105@α=0.05,σ unknown,n=30, x−=108,s=7.2
b. Test H 0: µ=21.6vs. H a: µ<21.6@α=0.01,σ unknown,n=78, x−=20.5,s=3.9
c. Test H 0: µ=−375vs. H a: µ≠−375@α=0.01,σ =18.5,n=31, x−=−388,s=18.0
A P P L I C A T I O N S
9. Inthepasttheaveragelengthofanoutgoingtelephonecallfromabusinessofficehasbeen143seconds.A
managerwishestocheckwhetherthataveragehasdecreasedaftertheintroductionofpolicychanges.A
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 374/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
374
sampleof100telephonecallsproducedameanof133seconds,withastandarddeviationof35seconds.
Performtherelevanttestatthe1%levelofsignificance.
10. Thegovernmentofanimpoverishedcountryreportsthemeanageatdeathamongthosewhohavesurvived
toadulthoodas66.2years.Areliefagencyexamines30randomlyselecteddeathsandobtainsameanof
62.3yearswithstandarddeviation8.1years.Testwhethertheagency’sdatasupportthealternative
hypothesis,atthe1%levelofsignificance,thatthepopulationmeanislessthan66.2.
11. Theaveragehouseholdsizeinacertainregionseveralyearsagowas3.14persons.Asociologistwishesto
test,atthe5%levelofsignificance,whetheritisdifferentnow.Performthetestusingtheinformation
collectedbythesociologist:inarandomsampleof75households,theaveragesizewas2.98persons,with
samplestandarddeviation0.82person.
12. Therecommendeddailycalorieintakeforteenagegirlsis2,200calories/day.Anutritionistatastate
universitybelievestheaveragedailycaloricintakeofgirlsinthatstatetobelower.Testthathypothesis,at
the5%levelofsignificance,againstthenullhypothesisthatthepopulationaverageis2,200calories/day
usingthefollowingsampledata:n=36, x−= 2,150,s=203.
13. Anautomobilemanufacturerrecommendsoilchangeintervalsof3,000miles.Tocompareactualintervalsto
therecommendation,thecompanyrandomlysamplesrecordsof50oilchangesatservicefacilitiesand
obtainssamplemean3,752mileswithsamplestandarddeviation638miles.Determinewhetherthedata
providesufficientevidence,atthe5%levelofsignificance,thatthepopulationmeanintervalbetweenoil
changesexceeds3,000miles.
14. Amedicallaboratoryclaimsthatthemeanturn-aroundtimeforperformanceofabatteryoftestsonblood
samplesis1.88businessdays.Themanagerofalargemedicalpracticebelievesthattheactualmeanis
larger.Arandomsampleof45bloodsamplesyieldedmean2.09andsamplestandarddeviation0.13day.
Performtherelevanttestatthe10%levelofsignificance,usingthesedata.
15. Agrocerystorechainhasasonestandardofservicethatthemeantimecustomerswaitinlinetobegin
checkingoutnotexceed2minutes.Toverifytheperformanceofastorethecompanymeasuresthewaiting
timein30instances,obtainingmeantime2.17minuteswithstandarddeviation0.46minute.Usethesedata
totestthenullhypothesisthatthemeanwaitingtimeis2minutesversusthealternativethatitexceeds2
minutes,atthe10%levelofsignificance.
16. Amagazinepublishertellspotentialadvertisersthatthemeanhouseholdincomeofitsregularreadershipis
$61,500.Anadvertisingagencywishestotestthisclaimagainstthealternativethatthemeanissmaller.A
sampleof40randomlyselectedregularreadersyieldsmeanincome$59,800withstandarddeviation$5,850.
Performtherelevanttestatthe1%levelofsignificance.
17. Authorsofacomputeralgebrasystemwishtocomparethespeedofanewcomputationalalgorithmtothe
currentlyimplementedalgorithm.Theyapplythenewalgorithmto50standardproblems;itaverages8.16
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 375/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
375
secondswithstandarddeviation0.17second.Thecurrentalgorithmaverages8.21secondsonsuch
problems.Test,atthe1%levelofsignificance,thealternativehypothesisthatthenewalgorithmhasalower
averagetimethanthecurrentalgorithm.
18. Arandomsampleofthestartingsalariesof35randomlyselectedgraduateswithbachelor’sdegreeslastyear
gavesamplemeanandstandarddeviation$41,202and$7,621,respectively.Testwhetherthedataprovide
sufficientevidence,atthe5%levelofsignificance,toconcludethatthemeanstartingsalaryofallgraduates
lastyearislessthanthemeanofallgraduatestwoyearsbefore,$43,589.
A D D I T I O N A L E X E R C I S E S
19. Themeanhouseholdincomeinaregionservedbyachainofclothingstoresis$48,750.Inasampleof40
customerstakenatvariousstoresthemeanincomeofthecustomerswas$51,505withstandarddeviation
$6,852.
a. Testatthe10%levelofsignificancethenullhypothesisthatthemeanhouseholdincomeof
customersofthechainis$48,750againstthatalternativethatitisdifferentfrom$48,750.
b. Thesamplemeanisgreaterthan$48,750,suggestingthattheactualmeanofpeoplewho
patronizethisstoreisgreaterthan$48,750.Performthistest,alsoatthe10%levelof
significance.(Thecomputationoftheteststatisticdoneinpart(a)stillapplieshere.)
20. Thelaborchargeforrepairsatanautomobileservicecenterarebasedonastandardtimespecifiedforeach
typeofrepair.Thetimespecifiedforreplacementofuniversaljointinadriveshaftisonehour.Themanager
reviewsasampleof30suchrepairs.Theaverageoftheactualrepairtimesis0.86hourwithstandard
deviation0.32hour.
a. Testatthe1%levelofsignificancethenullhypothesisthattheactualmeantimeforthisrepairdiffersfromonehour.
b. Thesamplemeanislessthanonehour,suggestingthatthemeanactualtimeforthisrepairis
lessthanonehour.Performthistest,alsoatthe1%levelofsignificance.(Thecomputationofthe
teststatisticdoneinpart(a)stillapplieshere.)
L A R G E D A T A S E T E X E R C I S E S
21. LargeDataSet1recordstheSATscoresof1,000students.Regardingitasarandomsampleofallhighschool
students,useittotestthehypothesisthatthepopulationmeanexceeds1,510,atthe1%levelof
significance.(Thenullhypothesisisthat μ=1510.)
http://www.flatworldknowledge.com/sites/all/files/data1.xls
22. LargeDataSet1recordstheGPAsof1,000collegestudents.Regardingitasarandomsampleofallcollege
students,useittotestthehypothesisthatthepopulationmeanislessthan2.50,atthe10%levelof
significance.(Thenullhypothesisisthat μ=2.50.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 376/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
376
http://www.flatworldknowledge.com/sites/all/files/data1.xls
23. LargeDataSet1liststheSATscoresof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Regardthedataasarisingfromacensusofallstudentsatahighschool,inwhichtheSATscore
ofeverystudentwasmeasured.Computethepopulationmean μ.
b. Regardthefirst50studentsinthedatasetasarandomsampledrawnfromthepopulationof
part(a)anduseittotestthehypothesisthatthepopulationmeanexceeds1,510,atthe10%
levelofsignificance.(Thenullhypothesisisthat μ=1510.)
c. Isyourconclusioninpart(b)inagreementwiththetruestateofnature(whichbypart(a)you
know),orisyourdecisioninerror?Ifyourdecisionisinerror,isitaTypeIerrororaTypeII
error?
24. LargeDataSet1liststheGPAsof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. Regardthedataasarisingfromacensusofallfreshmanatasmallcollegeattheendoftheirfirst
academicyearofcollegestudy,inwhichtheGPAofeverysuchpersonwasmeasured.Compute
thepopulationmean μ.
b. Regardthefirst50studentsinthedatasetasarandomsampledrawnfromthepopulationof
part(a)anduseittotestthehypothesisthatthepopulationmeanislessthan2.50,atthe10%
levelofsignificance.(Thenullhypothesisisthat μ=2.50.)
c. Isyourconclusioninpart(b)inagreementwiththetruestateofnature(whichbypart(a)you
know),orisyourdecisioninerror?Ifyourdecisionisinerror,isitaTypeIerrororaTypeII
error?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 377/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
377
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 378/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
378
8.3TheObservedSignificanceofaTest
L E A R N I N G O B J E C T I V E S
1. Tolearnwhattheobservedsignificanceofatestis.
2. Tolearnhowtocomputetheobservedsignificanceofatest.
3. Tolearnhowtoapplythe p-valueapproachtohypothesistesting.
TheObservedSignificanceThe conceptual basis of our testing procedure is that we reject H 0 only if the data that we obtained would
constitute a rare event if H 0 were actually true. The level of significance α specifies what is meant by “rare.”
The observed significance of the test is a measure of how rare the value of the test statistic that we have
just observed would be if the null hypothesis were true. That is, the observed significance of the test just
performed is the probability that, if the test were repeated with a new sample, the result of the new test
would be at least as contrary to H 0 and in support of H a as what was observed in the original test.
Definition
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 379/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
379
The observed significance or p-value of a specific test of hypotheses is the probability, on the
supposition that H 0 is true, of obtaining a result at least as contrary to H 0 and in favor of H a as the result
actually observed in the sample data.
Think back to Note 8.27 "Example 4" in Section 8.2 "Large Sample Tests for a Population
Mean" concerning the effectiveness of a new pain reliever. This was a left-tailed test in which the value of
the test statistic was 1.886. To be as contrary to H 0 and in support of H a as the result Z =−1.886 actually
observed means to obtain a value of the test statistic in the interval (−∞,−1.886]. Rounding 1.886 to
1.89, we can read directly from Figure 12.2 "Cumulative Normal
Probability" that P ( Z ≤−1.89)=0.0294. Thus the p-value or observed significance of the test in Note 8.27
"Example 4" is 0.0294 or about 3%. Under repeated sampling from this population, if H 0 were true then
only about 3% of all samples of size 50 would give a result as contrary to H 0 and in favor of H a as the
sample we observed. Note that the probability 0.0294 is the area of the left tail cut off by the test statistic
in this left-tailed test.
Analogous reasoning applies to a right-tailed or a two-tailed test, except that in the case of a two-tailed
test being as far from 0 as the observed value of the test statistic but on the opposite side of 0 is just as
contrary to H 0 as being the same distance away and on the same side of 0, hence the corresponding tail
area is doubled.
ComputationalDefinitionoftheObservedSignificanceofaTestof
Hypotheses
The observed significance of a test of hypotheses is the area of the tail of the distribution cut off by thetest statistic (times two in the case of a two-tailed test).
E X A M P L E 6
ComputetheobservedsignificanceofthetestperformedinNote8.28"Example5"inSection8.2"Large
SampleTestsforaPopulationMean".
Solution:
Thevalueoftheteststatisticwasz=2.490,whichbyFigure12.2"CumulativeNormalProbability"cutsoff
atailofarea0.0064,asshowninFigure8.7"AreaoftheTailfor".Sincethetestwastwo-tailed,the
observedsignificanceis2×0.0064=0.0128.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 380/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
380
Figure8.7 AreaoftheTailfor Note8.34"Example6"
The p-valueApproachtoHypothesisTestingIn Note 8.27 "Example 4" in Section 8.2 "Large Sample Tests for a Population Mean" the test was
performed at the 5% level of significance: the definition of “rare” event was probability α=0.05 or less.
We saw above that the observed significance of the test was p = 0.0294 or about 3%.
Since p=0.0294<0.05=α (or 3% is less than 5%), the decision turned out to be to reject: what was
observed was sufficiently unlikely to qualify as an event so rare as to be regarded as (practically)
incompatible with H 0.
In Note 8.28 "Example 5" in Section 8.2 "Large Sample Tests for a Population Mean" the test was
performed at the 1% level of significance: the definition of “rare” event was probability α=0.01 or less.
The observed significance of the test was computed in Note 8.34 "Example 6" as p = 0.0128 or about
1.3%. Since p=0.0128>0.01=α (or 1.3% is greater than 1%), the decision turned out to be not to reject.
The event observed was unlikely, but not sufficiently unlikely to lead to rejection of the null
hypothesis.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 381/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
381
The reasoning just presented is the basis for a slightly different but equivalent formulation of the
hypothesis testing process. The first three steps are the same as before, but instead of using α to
compute critical values and construct a rejection region, one computes the p-value p of the test and
compares it to α, rejecting H 0 if p≤α and not rejecting if p>α.
SystematicHypothesisTestingProcedure: p-ValueApproach1. Identify the null and alternative hypotheses.
2. Identify the relevant test statistic and its distribution.
3. Compute from the data the value of the test statistic.
4. Compute the p-value of the test.
5. Compare the value computed in Step 4 to significance level α and make a decision: reject H 0 if p≤α and do
not reject H 0 if p>α. Formulate the decision in the context of the problem, if applicable.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 382/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
382
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 383/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
383
E X A M P L E 8
Mr.ProsperohasbeenteachingAlgebraIIfromaparticulartextbookatRemoteIsleHighSchoolfor
manyyears.OvertheyearsstudentsinhisAlgebraIIclasseshaveconsistentlyscoredanaverageof
67ontheendofcourseexam(EOC).ThisyearMr.Prosperousedanewtextbookinthehopethatthe
averagescoreontheEOCtestwouldbehigher.TheaverageEOCtestscoreofthe64studentswho
tookAlgebraIIfromMr.Prosperothisyearhadmean69.4andsamplestandarddeviation6.1.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 384/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
384
Determinewhetherthesedataprovidesufficientevidence,atthe1%levelofsignificance,to
concludethattheaverageEOCtestscoreishigherwiththenewtextbook.
Solution:
• Step1.Let μbethetrueaveragescoreontheEOCexamofallMr.Prospero’sstudentswhotake
theAlgebraIIcoursewiththenewtextbook.Thenaturalstatementthatwouldbeassumedtrue
unlesstherewerestrongevidencetothecontraryisthatthenewbookisaboutthesameasthe
oldone.Thealternative,whichittakesevidencetoestablish,isthatthenewbookisbetter,which
correspondstoahighervalueof μ.Thustherelevanttestis
H 0: µ = 67
vs. H a: µ >67 @
α=0.01
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 385/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 386/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
386
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 387/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
387
Figure8.10TestStatisticfor Note8.38"Example9"
K E Y T A K E A W A Y S
• Theobservedsignificanceor p-valueofatestisameasureofhowinconsistentthesampleresultis
withH0andinfavorofHa.
• The p-valueapproachtohypothesistestingmeansthatonemerelycomparesthe p-valuetoαinsteadof
constructingarejectionregion.
• Thereisasystematicfive-stepprocedureforthe p-valueapproachtohypothesistesting.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 388/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
388
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 389/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
389
a. Performtherelevanttestofhypothesesatthe20%levelofsignificanceusingthecriticalvalue
approach.
b. Computetheobservedsignificanceofthetest.
c. Performthetestatthe20%levelofsignificanceusingthe p-valueapproach.Youneednotrepeat
thefirstthreesteps,alreadydoneinpart(a).
9. Themeanscoreona25-pointplacementexaminmathematicsusedforthepasttwoyearsatalargestate
universityis14.3.Theplacementcoordinatorwishestotestwhetherthemeanscoreonarevisedversionof
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 390/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 391/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
391
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 392/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
392
8.4SmallSampleTestsforaPopulationMean
L E A R N I N G O B J E C T I V E
1. Tolearnhowtoapplythefive-steptestprocedurefortestofhypothesesconcerningapopulationmean
whenthesamplesizeissmall.
In the previous section hypotheses testing for population means was described in the case of large
samples. The statistical validity of the tests was insured by the Central Limit Theorem, with
essentially no assumptions on the distribution of the population. When sample sizes are small, as is
often the case in practice, the Central Limit Theorem does not apply. One must then impose stricter
assumptions on the population to give statistical validity to the test procedure. One common
assumption is that the population from which the sample is taken has a normal probability
distribution to begin with. Under such circumstances, if the population standard deviation is known,
then the test statistic ( x −− µ0)/(σ /√n) still has the standard normal distribution, as in the previous two
sections. If is unknown and is approximated by the sample standard deviation s, then the resulting
test statistic ( x −− µ0)/(s/√n) follows Student’s t -distribution with n−1 degrees of freedom.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 393/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
393
Figure 8.11 Distribution of the Standardized Test Statistic and the Rejection Region
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 394/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
394
The p-value of a test of hypotheses for which the test statistic has Student’s t -distribution can be
computed using statistical software, but it is impractical to do so using tables, since that would
require 30 tables analogous to Figure 12.2 "Cumulative Normal Probability", one for each degree of
freedom from 1 to 30.Figure 12.3 "Critical Values of " can be used to approximate the p-value of such
a test, and this is typically adequate for making a decision using the p-value approach to hypothesis
testing, although not always. For this reason the tests in the two examples in this section will be
made following the critical value approach to hypothesis testing summarized at the end of Section 8.1
"The Elements of Hypothesis Testing", but after each one we will show how the p-value approach
could have been used.
E X A M P L E 1 0
Thepriceofapopulartennisracketatanationalchainstoreis$179.Portiaboughtfiveofthesameracket
atanonlineauctionsiteforthefollowingprices:
155 179 175 175 161
Assumingthattheauctionpricesofracketsarenormallydistributed,determinewhetherthereissufficient
evidenceinthesample,atthe5%levelofsignificance,toconcludethattheaveragepriceoftheracketis
lessthan$179ifpurchasedatanonlineauction.
Solution:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 395/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
395
• Step1.Theassertionforwhichevidencemustbeprovidedisthattheaverageonlineprice μisless
thantheaveragepriceinretailstores,sothehypothesistestis
(−∞,−2.132].
• Step5.AsshowninFigure8.12"RejectionRegionandTestStatisticfor"theteststatisticfallsin
therejectionregion.ThedecisionistorejectH0.Inthecontextoftheproblemourconclusionis:
Thedataprovidesufficientevidence,atthe5%levelofsignificance,toconcludethattheaverage
priceofsuchracketspurchasedatonlineauctionsislessthan$179.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 396/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
396
Figure8.12RejectionRegionandTestStatisticfor Note8.42"Example10"
To perform the test in Note 8.42 "Example 10" using the p-value approach, look in the row in Figure 12.3
"Critical Values of " with the heading df =4 and search for the two t -values that bracket the unsigned value
2.152 of the test statistic. They are 2.132 and 2.776, in the columns with headings t 0.050 and t 0.025. They cut
off right tails of area 0.050 and 0.025, so because 2.152 is between them it must cut off a tail of area
between 0.050 and 0.025. By symmetry 2.152 cuts off a left tail of area between 0.050 and 0.025, hence
the p-value corresponding to t =−2.152 is between 0.025 and 0.05. Although its precise value is unknown, it
must be less than α=0.05, so the decision is to reject H 0.
E X A M P L E 1 1
Asmallcomponentinanelectronicdevicehastwosmallholeswhereanothertinypartisfitted.In
themanufacturingprocesstheaveragedistancebetweenthetwoholesmustbetightlycontrolledat
0.02mm,elsemanyunitswouldbedefectiveandwasted.Manytimesthroughoutthedayquality
controlengineerstakeasmallsampleofthecomponentsfromtheproductionline,measurethe
distancebetweenthetwoholes,andmakeadjustmentsifneeded.Supposeatonetimefourunitsare
takenandthedistancesaremeasuredas
0.021 0.019 0.023 0.020
Determine,atthe1%levelofsignificance,ifthereissufficientevidenceinthesampletoconclude
thatanadjustmentisneeded.Assumethedistancesofinterestarenormallydistributed.
Solution:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 397/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
397
• Step1.Theassumptionisthattheprocessisundercontrolunlessthereisstrongevidencetothe
contrary.Sinceadeviationoftheaveragedistancetoeithersideisundesirable,therelevanttestis
conclusionis:
Thedatadonotprovidesufficientevidence,atthe1%levelofsignificance,toconcludethatthe
meandistancebetweentheholesinthecomponentdiffersfrom0.02mm.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 398/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
398
Figure8.13RejectionRegionandTestStatisticfor Note8.43"Example11"
To perform the test in Note 8.43 "Example 11" using the p-value approach, look in the row
in Figure 12.3 "Critical Values of " with the heading df =3 and search for the two t -values that
bracket the value 0.877 of the test statistic. Actually 0.877 is smaller than the smallest number in
the row, which is 0.978, in the column with heading t 0.200. The value 0.978 cuts off a right tail of
area 0.200, so because 0.877 is to its left it must cut off a tail of area greater than 0.200. Thus
the p-value, which is the double of the area cut off (since the test is two-tailed), is greater than
0.400. Although its precise value is unknown, it must be greater than α=0.01, so the decision is not
to reject H 0.
K E Y T A K E A W A Y S
• Therearetwoformulasfortheteststatisticintestinghypothesesaboutapopulationmeanwithsmall
samples.Oneteststatisticfollowsthestandardnormaldistribution,theotherStudent’st -distribution.
• Thepopulationstandarddeviationisusedifitisknown,otherwisethesamplestandarddeviationisused.
• Eitherfive-stepprocedure,criticalvalueor p-valueapproach,isusedwitheitherteststatistic.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 399/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
399
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 400/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
400
a. Test H 0: µ=250vs. H a: µ>250@α=0.05.
b. Estimatetheobservedsignificanceofthetestinpart(a)andstateadecisionbasedonthe p-valueapproachto
hypothesistesting.
8. Arandomsampleofsize12drawnfromanormalpopulationyieldedthefollowingresults: x−=86.2,s=0.63.
a. Test H 0: µ=85.5vs. H a: µ≠85.5@α=0.01.
b. Estimatetheobservedsignificanceofthetestinpart(a)andstateadecisionbasedonthe p-value
approachtohypothesistesting.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 401/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
401
A P P L I C A T I O N S
9. Researcherswishtotesttheefficacyofaprogramintendedtoreducethelengthoflaborinchildbirth.The
acceptedmeanlabortimeinthebirthofafirstchildis15.3hours.Themeanlengthofthelaborsof13first-
timemothersinapilotprogramwas8.8hourswithstandarddeviation3.1hours.Assuminganormal
distributionoftimesoflabor,testatthe10%levelofsignificancetestwhetherthemeanlabortimeforall
womenfollowingthisprogramislessthan15.3hours.
10. Adairyfarmusesthesomaticcellcount(SCC)reportonthemilkitprovidestoaprocessorasonewayto
monitorthehealthofitsherd.ThemeanSCCfromfivesamplesofrawmilkwas250,000cellspermilliliter
withstandarddeviation37,500cell/ml.Testwhetherthesedataprovidesufficientevidence,atthe10%level
ofsignificance,toconcludethatthemeanSCCofallmilkproducedatthedairyexceedsthatintheprevious
report,210,250cell/ml.AssumeanormaldistributionofSCC.
11. Sixcoinsofthesametypearediscoveredatanarchaeologicalsite.Iftheirweightsonaverageare
significantlydifferentfrom5.25gramsthenitcanbeassumedthattheirprovenanceisnotthesiteitself.The
coinsareweighedandhavemean4.73gwithsamplestandarddeviation0.18g.Performtherelevanttestat
the0.1%(1/10thof1%)levelofsignificance,assuminganormaldistributionofweightsofallsuchcoins.
12. Aneconomistwishestodeterminewhetherpeoplearedrivinglessthaninthepast.Inoneregionofthe
countrythenumberofmilesdrivenperhouseholdperyearinthepastwas18.59thousandmiles.Asample
of15householdsproducedasamplemeanof16.23thousandmilesforthelastyear,withsamplestandard
deviation4.06thousandmiles.Assuminganormaldistributionofhouseholddrivingdistancesperyear,
performtherelevanttestatthe5%levelofsignificance.
13. Therecommendeddailyallowanceofironforfemalesaged19–50is18mg/day.Acarefulmeasurementof
thedailyironintakeof15womenyieldedameandailyintakeof16.2mgwithsamplestandarddeviation4.7
mg.
a. Assumingthatdailyironintakeinwomenisnormallydistributed,performthetestthattheactual
meandailyintakeforallwomenisdifferentfrom18mg/day,atthe10%levelofsignificance.
b. Thesamplemeanislessthan18,suggestingthattheactualpopulationmeanislessthan18
mg/day.Performthistest,alsoatthe10%levelofsignificance.(Thecomputationofthetest
statisticdoneinpart(a)stillapplieshere.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 402/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
402
14. Thetargettemperatureforahotbeveragethemomentitisdispensedfromavendingmachineis170°F.A
sampleoftenrandomlyselectedservingsfromanewmachineundergoingapre-shipmentinspectiongave
meantemperature173°Fwithsamplestandarddeviation6.3°F.
a. Assumingthattemperatureisnormallydistributed,performthetestthatthemeantemperature
ofdispensedbeveragesisdifferentfrom170°F,atthe10%levelofsignificance.
b. Thesamplemeanisgreaterthan170,suggestingthattheactualpopulationmeanisgreaterthan
170°F.Performthistest,alsoatthe10%levelofsignificance.(Thecomputationofthetest
statisticdoneinpart(a)stillapplieshere.)
15. Theaveragenumberofdaystocompleterecoveryfromaparticulartypeofkneeoperationis123.7days.
Fromhisexperienceaphysiciansuspectsthatuseofatopicalpainmedicationmightbelengtheningthe
recoverytime.Herandomlyselectstherecordsofsevenkneesurgerypatientswhousedthetopical
medication.Thetimestototalrecoverywere:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 403/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
403
20,000,atthe10%levelofsignificance.AssumethattheSPCfollowsanormaldistribution.
18. Onewaterqualitystandardforwaterthatisdischargedintoaparticulartypeofstreamorpondisthatthe
averagedailywatertemperaturebeatmost18°C.Sixsamplestakenthroughoutthedaygavethedata:
16.8 21.5 19.1 12.8 18.0 20.7Thesamplemean x^−=18.15exceeds18,butperhapsthisisonlysamplingerror.Determinewhetherthedata
providesufficientevidence,atthe10%levelofsignificance,toconcludethatthemeantemperatureforthe
entiredayexceeds18°C.
A D D I T I O N A L E X E R C I S E S
19. Acalculatorhasabuilt-inalgorithmforgeneratingarandomnumberaccordingtothestandardnormal
distribution.Twenty-fivenumbersthusgeneratedhavemean0.15andsamplestandarddeviation0.94.Test
thenullhypothesisthatthemeanofallnumberssogeneratedis0versusthealternativethatitisdifferent
from0,atthe20%levelofsignificance.Assumethatthenumbersdofollowanormaldistribution.
20. Ateverysettingahigh-speedpackingmachinedeliversaproductinamountsthatvaryfromcontainerto
containerwithanormaldistributionofstandarddeviation0.12ounce.Tocomparetheamountdeliveredat
thecurrentsettingtothedesiredamount64.1ounce,aqualityinspectorrandomlyselectsfivecontainers
andmeasuresthecontentsofeach,obtainingsamplemean63.9ouncesandsamplestandarddeviation0.10
ounce.Testwhetherthedataprovidesufficientevidence,atthe5%levelofsignificance,toconcludethatthe
meanofallcontainersatthecurrentsettingislessthan64.1ounces.
21. Amanufacturingcompanyreceivesashipmentof1,000boltsofnominalshearstrength4,350lb.Aquality
controlinspectorselectsfiveboltsatrandomandmeasurestheshearstrengthofeach.Thedataare:
4,320 4,290 4,360 4,350 4,320
a. Assuminganormaldistributionofshearstrengths,testthenullhypothesisthatthemeanshear
strengthofallboltsintheshipmentis4,350lbversusthealternativethatitislessthan4,350lb,
atthe10%levelofsignificance.
b. Estimatethe p-value(observedsignificance)ofthetestofpart(a).
c. Comparethe p-valuefoundinpart(b)toα=0.10andmakeadecisionbasedonthe p-value
approach.Explainfully.
22. AliteraryhistorianexaminesanewlydiscovereddocumentpossiblywrittenbyOberonTheseus.Themean
averagesentencelengthofthesurvivingundisputedworksofOberonTheseusis48.72words.Thehistorian
countswordsinsentencesbetweenfivesuccessive101periodsinthedocumentinquestiontoobtaina
meanaveragesentencelengthof39.46wordswithstandarddeviation7.45words.(Thusthesamplesizeis
five.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 404/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
404
a. Determineifthesedataprovidesufficientevidence,atthe1%levelofsignificance,toconclude
thatthemeanaveragesentencelengthinthedocumentislessthan48.72.
b. Estimatethe p-valueofthetest.
c. Basedontheanswerstoparts(a)and(b),statewhetherornotitislikelythatthedocumentwas
writtenbyOberonTheseus.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 405/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
405
8.5LargeSampleTestsforaPopulationProportion
L E A R N I N G O B J E C T I V E S
1. Tolearnhowtoapplythefive-stepcriticalvaluetestprocedurefortestofhypothesesconcerninga
populationproportion.
2. Tolearnhowtoapplythefive-step p-valuetestprocedurefortestofhypothesesconcerningapopulation
proportion.
Both the critical value approach and the p-value approach can be applied to test hypotheses about a
population proportion p. The null hypothesis will have the form H 0: p= p0 for some specificnumber p0 between 0 and 1. The alternative hypothesis will be one of the three inequalities p< p0, p> p0,
or p≠ p0 for the same number p0 that appears in the null hypothesis.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 406/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
406
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 407/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
407
Figure 8.14 Distribution of the Standardized Test Statistic and the Rejection Region
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 408/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
408
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 409/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
409
• Step5.AsshowninFigure8.15"RejectionRegionandTestStatisticfor"theteststatisticfallsin
therejectionregion.ThedecisionistorejectH0.Inthecontextoftheproblemourconclusionis:
Thedataprovidesufficientevidence,atthe5%levelofsignificance,toconcludethatamajorityof
adultspreferthecompany’sbeveragetothatoftheircompetitor’s.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 410/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
410
Figure8.15RejectionRegionandTestStatisticfor Note8.47"Example12"
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 411/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
411
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 412/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
412
• Step5.AsshowninFigure8.16"RejectionRegionandTestStatisticfor"theteststatisticdoesnot
fallintherejectionregion.ThedecisionisnottorejectH0.Inthecontextoftheproblemour
conclusionis:
Thedatadonotprovidesufficientevidence,atthe10%levelofsignificance,toconcludethatthe
proportionofnewbornswhoaremalediffersfromthehistoricproportionintimesofeconomic
recession.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 413/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
413
Figure8.16RejectionRegionandTestStatisticfor Note8.48"Example13"
E X A M P L E 1 4
PerformthetestofNote8.47"Example12"usingthe p-valueapproach.
Solution:
Wealreadyknowthatthesamplesizeissufficientlylargetovalidlyperformthetest.
• Steps1–3ofthefive-stepproceduredescribedinSection8.3.2"The"havealreadybeendoneinNote8.47
"Example12"sowewillnotrepeatthemhere,butonlysaythatweknowthatthetestisright-tailedand
thatvalueoftheteststatisticis Z =1.789.
• Step4.Sincethetestisright-tailedthe p-valueistheareaunderthestandardnormalcurvecutoffbythe
observedteststatistic,z=1.789,asillustratedinFigure8.17.ByFigure12.2"CumulativeNormal
Probability"thatareaandthereforethe p-valueis1−0.9633=0.0367.
• Step5.Sincethe p-valueislessthanα=0.05thedecisionistorejectH0.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 414/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
414
Figure8.17 P-ValueforNote8.49"Example14"
E X A M P L E 1 5
PerformthetestofNote8.48"Example13"usingthe p-valueapproach.
Solution:
Wealreadyknowthatthesamplesizeissufficientlylargetovalidlyperformthetest.
• Steps1–3ofthefive-stepproceduredescribedinSection8.3.2"The"havealreadybeendoneinNote8.48
"Example13".Theytellusthatthetestistwo-tailedandthatvalueoftheteststatisticis Z =1.542.
• Step4.Sincethetestistwo-tailedthe p-valueisthedoubleoftheareaunderthestandardnormalcurve
cutoffbytheobservedteststatistic,z=1.542.ByFigure12.2"CumulativeNormalProbability"thatarea
is1−0.9382=0.0618,asillustratedin Figure8.18,hencethe p-valueis2×0.0618=0.1236.
• Step5.Sincethe p-valueisgreaterthanα=0.10thedecisionisnottorejectH0.
Figure8.18P-ValueforNote8.50"Example15"
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 415/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
415
K E Y T A K E A W A Y S
• Thereisoneformulafortheteststatisticintestinghypothesesaboutapopulationproportion.Thetest
statisticfollowsthestandardnormaldistribution.
• Eitherfive-stepprocedure,criticalvalueor p-valueapproach,canbeused.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 416/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
416
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 417/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
417
A P P L I C A T I O N S
11. Fiveyearsago3.9%ofchildreninacertainregionlivedwithsomeoneotherthanaparent.Asociologist
wishestotestwhetherthecurrentproportionisdifferent.Performtherelevanttestatthe5%levelof
significanceusingthefollowingdata:inarandomsampleof2,759children,119livedwithsomeoneother
thanaparent.
12. Thegovernmentofaparticularcountryreportsitsliteracyrateas52%.Anongovernmentalorganization
believesittobeless.Theorganizationtakesarandomsampleof600inhabitantsandobtainsaliteracyrate
of42%.Performtherelevanttestatthe0.5%(one-halfof1%)levelofsignificance.
13. Twoyearsago72%ofhouseholdinacertaincountyregularlyparticipatedinrecyclinghouseholdwaste.Thecountygovernmentwishestoinvestigatewhetherthatproportionhasincreasedafteranintensivecampaign
promotingrecycling.Inasurveyof900households,674regularlyparticipateinrecycling.Performthe
relevanttestatthe10%levelofsignificance.
14. Priortoaspecialadvertisingcampaign,23%ofalladultsrecognizedaparticularcompany’slogo.Attheclose
ofthecampaignthemarketingdepartmentcommissionedasurveyinwhich311of1,200randomlyselected
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 418/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
418
adultsrecognizedthelogo.Determine,atthe1%levelofsignificance,whetherthedataprovidesufficient
evidencetoconcludethatmorethan23%ofalladultsnowrecognizethecompany’slogo.
15. Areportfiveyearsagostatedthat35.5%ofallstate-ownedbridgesinaparticularstatewere“deficient.”An
advocacygrouptookarandomsampleof100state-ownedbridgesinthestateandfound33tobecurrently
ratedasbeing“deficient.”Testwhetherthecurrentproportionofbridgesinsuchconditionis35.5%versus
thealternativethatitisdifferentfrom35.5%,atthe10%levelofsignificance.
16. Inthepreviousyeartheproportionofdepositsincheckingaccountsatacertainbankthatweremade
electronicallywas45%.Thebankwishestodetermineiftheproportionishigherthisyear.Itexamined
20,000depositrecordsandfoundthat9,217wereelectronic.Determine,atthe1%levelofsignificance,
whetherthedataprovidesufficientevidencetoconcludethatmorethan45%ofalldepositstochecking
accountsarenowbeingmadeelectronically.
17. AccordingtotheFederalPovertyMeasure12%oftheU.S.populationlivesinpoverty.Thegovernorofa
certainstatebelievesthattheproportionthereislower.Inasampleofsize1,550,163wereimpoverished
accordingtothefederalmeasure.
a. Testwhetherthetrueproportionofthestate’spopulationthatisimpoverishedislessthan12%,
atthe5%levelofsignificance.
1. Computetheobservedsignificanceofthetest.
18. Aninsurancecompanystatesthatitsettles85%ofalllifeinsuranceclaimswithin30days.Aconsumergroup
asksthestateinsurancecommissiontoinvestigate.Inasampleof250lifeinsuranceclaims,203weresettled
within30days.
a. Testwhetherthetrueproportionofalllifeinsuranceclaimsmadetothiscompanythatare
settledwithin30daysislessthan85%,atthe5%levelofsignificance.
b. Computetheobservedsignificanceofthetest.
19. Aspecialinterestgroupassertsthat90%ofallsmokersbegansmokingbeforeage18.Inasampleof850
smokers,687begansmokingbeforeage18.
a. Testwhetherthetrueproportionofallsmokerswhobegansmokingbeforeage18islessthan
90%,atthe1%levelofsignificance.
b. Computetheobservedsignificanceofthetest.
20. Inthepast,68%ofagarage’sbusinesswaswithformerpatrons.Theownerofthegaragesamples200repair
invoicesandfindsthatforonly114ofthemthepatronwasarepeatcustomer.
a. Testwhetherthetrueproportionofallcurrentbusinessthatiswithrepeatcustomersislessthan
68%,atthe1%levelofsignificance.
b. Computetheobservedsignificanceofthetest.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 419/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
419
A D D I T I O N A L E X E R C I S E S
21. Aruleofthumbisthatforworkingindividualsone-quarterofhouseholdincomeshouldbespentonhousing.
Afinancialadvisorbelievesthattheaverageproportionofincomespentonhousingismorethan0.25.Ina
sampleof30households,themeanproportionofhouseholdincomespentonhousingwas0.285witha
standarddeviationof0.063.Performtherelevanttestofhypothesesatthe1%levelofsignificance.Hint:This
exercisecouldhavebeenpresentedinanearliersection.
22. Icecreamislegallyrequiredtocontainatleast10%milkfatbyweight.Themanufacturerofaneconomyice
creamwishestobeclosetothelegallimit,henceproducesitsicecreamwithatargetproportionof0.106
milkfat.Asampleoffivecontainersyieldedameanproportionof0.094milkfatwithstandarddeviation
0.002.Testthenullhypothesisthatthemeanproportionofmilkfatinallcontainersis0.106againstthe
alternativethatitislessthan0.106,atthe10%levelofsignificance.Assumethattheproportionofmilkfatin
containersisnormallydistributed.Hint:Thisexercisecouldhavebeenpresentedinanearliersection.
L A R G E D A T A S E T E X E R C I S E S
23. LargeDataSets4and4Alisttheresultsof500tossesofadie.Let pdenotetheproportionofalltossesofthis
diethatwouldresultinafive.Usethesampledatatotestthehypothesisthat pisdifferentfrom1/6,atthe
20%levelofsignificance.
http://www.flatworldknowledge.com/sites/all/files/data4.xls
http://www.flatworldknowledge.com/sites/all/files/data4A.xls
24. LargeDataSet6recordsresultsofarandomsurveyof200votersineachoftworegions,inwhichtheywere
askedtoexpresswhethertheypreferCandidate AforaU.S.Senateseatorprefersomeothercandidate.Use
thefulldataset(400observations)totestthehypothesisthattheproportion pofallvoterswhoprefer
Candidate Aexceeds0.35.Testatthe10%levelofsignificance.
http://www.flatworldknowledge.com/sites/all/files/data6.xls
25. Lines2through536inLargeDataSet11isasampleof535realestatesalesinacertainregionin2008.Those
thatwereforeclosuresalesareidentifiedwitha1inthesecondcolumn.Usethesedatatotest,atthe10%
levelofsignificance,thehypothesisthattheproportion pofallrealestatesalesinthisregionin2008that
wereforeclosuresaleswaslessthan25%.(Thenullhypothesisis H 0: p=0.25.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 420/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
420
http://www.flatworldknowledge.com/sites/all/files/data11.xls
26. Lines537through1106inLargeDataSet11isasampleof570realestatesalesinacertainregionin2010.
Thosethatwereforeclosuresalesareidentifiedwitha1inthesecondcolumn.Usethesedatatotest,atthe
5%levelofsignificance,thehypothesisthattheproportion pofallrealestatesalesinthisregionin2010that
wereforeclosuresaleswasgreaterthan23%.(Thenullhypothesisis H 0: p=0.23.)
http://www.flatworldknowledge.com/sites/all/files/data11.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 421/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 422/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
422
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 423/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
423
Chapter9
Two-SampleProblems
The previous two chapters treated the questions of estimating and making inferences about a
parameter of a single population. In this chapter we consider a comparison of parameters that
belong to two different populations. For example, we might wish to compare the average income of
all adults in one region of the country with the average income of those in another region, or we
might wish to compare the proportion of all men who are vegetarians with the proportion of all
women who are vegetarians.
We will study construction of confidence intervals and tests of hypotheses in four situations,
depending on the parameter of interest, the sizes of the samples drawn from each of the populations,
and the method of sampling. We also examine sample size considerations.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 424/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
424
9.1ComparisonofTwoPopulationMeans:Large,Independent
Samples
L E A R N I N G O B J E C T I V E S
1. Tounderstandthelogicalframeworkforestimatingthedifferencebetweenthemeansoftwodistinct
populationsandperformingtestsofhypothesesconcerningthosemeans.
2. Tolearnhowtoconstructaconfidenceintervalforthedifferenceinthemeansoftwodistinctpopulations
usinglarge,independentsamples.
3. Tolearnhowtoperformatestofhypothesesconcerningthedifferencebetweenthemeansoftwo
distinctpopulationsusinglarge,independentsamples.
Suppose we wish to compare the means of two distinct populations. Figure 9.1 "Independent
Sampling from Two Populations" illustrates the conceptual framework of our investigation in this
and the next section. Each population has a mean and a standard deviation. We arbitrarily label one
population as Population 1 and the other as Population 2, and subscript the parameters with the
numbers 1 and 2 to tell them apart. We draw a random sample from Population 1 and label the
sample statistics it yields with the subscript 1. Without reference to the first sample we draw a
sample from Population 2 and label its sample statistics with the subscript 2.
Figure 9.1 Independent Sampling from Two Populations
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 425/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
425
Definition
Samples from two distinct populations are independent if each one is drawn without reference to the
other, and has no connection with the other.
E X A M P L E 1
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 426/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
426
Tocomparecustomersatisfactionlevelsoftwocompetingcabletelevisioncompanies,174customers
ofCompany1and355customersofCompany2wererandomlyselectedandwereaskedtoratetheir
cablecompaniesonafive-pointscale,with1beingleastsatisfiedand5mostsatisfied.Thesurvey
resultsaresummarizedinthefollowingtable:
Company1 Company2
n1=174 n2=355
x−1=3.51 x−2=3.24
s1=0.51 s2=0.52
Constructapointestimateanda99%confidenceintervalfor µ1− µ2,thedifferenceinaverage
satisfactionlevelsofcustomersofthetwocompaniesasmeasuredonthisfive-pointscale.
Solution:
Thepointestimateof µ1− µ2is
x^−1 − x^−2=3.51−3.24=0.27.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 427/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
427
HypothesisTesting
Hypotheses concerning the relative sizes of the means of two populations are tested using the same
critical value and p-value procedures that were used in the case of a single population. All that is
needed is to know how to express the null and alternative hypotheses and to know the formula for
the standardized test statistic and the distribution that it follows.
The null and alternative hypotheses will always be expressed in terms of the difference of the two
population means. Thus the null hypothesis will always be written
H 0: µ1− µ2= D0
where D0 is a number that is deduced from the statement of the situation. As was the case with a
single population the alternative hypothesis can take one of the three forms, with the same
terminology:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 428/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
428
Formof H a Terminology
H a: µ1− µ2< D0 Left-tailed
H a: µ1− µ2> D0 Right-tailed
H a: µ1− µ2≠ D0 Two-tailed
As long as the samples are independent and both are large the following formula for the standardized
test statistic is valid, and it has the standard normal distribution. (In the relatively rare case that both
population standard deviations σ 1 and σ 2 are known they would be used instead of the sample
standard deviations.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 429/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
429
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 430/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
430
rejectH0.Inthecontextoftheproblemourconclusionis:
Thedataprovidesufficientevidence,atthe1%levelofsignificance,toconcludethatthemean
customersatisfactionforCompany1ishigherthanthatforCompany2.
E X A M P L E 3
PerformthetestofNote9.6"Example2"usingthe p-valueapproach.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 431/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
431
Solution:
ThefirstthreestepsareidenticaltothoseinNote9.6"Example2".
• Step4.Theobservedsignificanceor p-valueofthetestistheareaoftherighttailofthestandardnormal
distributionthatiscutoffbytheteststatistic Z =5.684.Thenumber5.684istoolargetoappearinFigure
12.2"CumulativeNormalProbability" ,whichmeansthattheareaoftheleft tailthatitcutsoffis1.0000to
fourdecimalplaces.Theareathatweseek,theareaoftheright tail,istherefore1−1.0000=0.0000tofour
decimalplaces.See Figure9.3.Thatis, p -value=0.0000tofourdecimalplaces.(Theactualvalueis
approximately0.000 000 007.)
Figure9.3P-ValueforNote9.7"Example3"
• Step5.Since0.0000<0.01, p -value<αsothedecisionistorejectthenullhypothesis:
Thedataprovidesufficientevidence,atthe1%levelofsignificance,toconcludethatthemean
customersatisfactionforCompany1ishigherthanthatforCompany2.
K E Y T A K E A W A Y S
• Apointestimateforthedifferenceintwopopulationmeansissimplythedifferenceinthecorresponding
samplemeans.
• Inthecontextofestimatingortestinghypothesesconcerningtwopopulationmeans,“large”samples
meansthatbothsamplesarelarge.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 432/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
432
• Aconfidenceintervalforthedifferenceintwopopulationmeansiscomputedusingaformulainthesame
fashionaswasdoneforasinglepopulationmean.
• Thesamefive-stepprocedureusedtotesthypothesesconcerningasinglepopulationmeanisusedtotest
hypothesesconcerningthedifferencebetweentwopopulationmeans.Theonlydifferenceisinthe
formulaforthestandardizedteststatistic.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 433/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
433
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 434/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
434
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 435/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
435
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 436/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
436
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 437/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
437
A P P L I C A T I O N S
13. Inordertoinvestigatetherelationshipbetweenmeanjobtenureinyearsamongworkerswhohavea
bachelor’sdegreeorhigherandthosewhodonot,randomsamplesofeachtypeofworkerweretaken,withthefollowingresults.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 438/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
438
n x− s
Bachelor’s degree or higher 155 5.2 1.3
No degree 210 5.0 1.5
a. Constructthe99%confidenceintervalforthedifferenceinthepopulationmeansbasedonthese
data.
b. Test,atthe1%levelofsignificance,theclaimthatmeanjobtenureamongthosewithhigher
educationisgreaterthanamongthosewithout,againstthedefaultthatthereisnodifferencein
themeans.
c. Computetheobservedsignificanceofthetest.
14. Recordsof40usedpassengercarsand40usedpickuptrucks(noneusedcommercially)wererandomly
selectedtoinvestigatewhethertherewasanydifferenceinthemeantimeinyearsthattheywerekeptby
theoriginalownerbeforebeingsold.Forcarsthemeanwas5.3yearswithstandarddeviation2.2years.For
pickuptrucksthemeanwas7.1yearswithstandarddeviation3.0years.
a. Constructthe95%confidenceintervalforthedifferenceinthemeansbasedonthesedata.
b. Testthehypothesisthatthereisadifferenceinthemeansagainstthenullhypothesisthatthere
isnodifference.Usethe1%levelofsignificance.
c. Computetheobservedsignificanceofthetestinpart(b).
15. Inpreviousyearstheaveragenumberofpatientsperhouratahospitalemergencyroomonweekends
exceededtheaverageonweekdaysby6.3visitsperhour.Ahospitaladministratorbelievesthatthecurrent
weekendmeanexceedstheweekdaymeanbyfewerthan6.3hours.
a. Constructthe99%confidenceintervalforthedifferenceinthepopulationmeansbasedonthe
followingdata,derivedfromastudyinwhich30weekendand30weekdayone-hourperiods
wererandomlyselectedandthenumberofnewpatientsineachrecorded.
n x− s
Weekends 30 13.8 3.1
Weekdays 30 8.6 2.7
b. Testatthe5%levelofsignificancewhetherthecurrentweekendmeanexceedstheweekday
meanbyfewerthan6.3patientsperhour.
c. Computetheobservedsignificanceofthetest.
16. Asociologistsurveys50randomlyselectedcitizensineachoftwocountriestocomparethemeannumber
ofhoursofvolunteerworkdonebyadultsineach.Amongthe50inhabitantsofLilliput,themeanhours
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 439/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
439
ofvolunteerworkperyearwas52,withstandarddeviation11.8.Amongthe50inhabitantsofBlefuscu,
themeannumberofhoursofvolunteerworkperyearwas37,withstandarddeviation7.2.
a. Constructthe99%confidenceintervalforthedifferenceinmeannumberofhoursvolunteered
byallresidentsofLilliputandthemeannumberofhoursvolunteeredbyallresidentsofBlefuscu.
b. Test,atthe1%levelofsignificance,theclaimthatthemeannumberofhoursvolunteeredbyall
residentsofLilliputismorethantenhoursgreaterthanthemeannumberofhoursvolunteered
byallresidentsofBlefuscu.
c. Computetheobservedsignificanceofthetestinpart(b).
17. Auniversityadministratorassertedthatupperclassmenspendmoretimestudyingthanunderclassmen.
a. Testthisclaimagainstthedefaultthattheaveragenumberofhoursofstudyperweekbythe
twogroupsisthesame,usingthefollowinginformationbasedonrandomsamplesfromeach
groupofstudents.Testatthe1%levelofsignificance.
n x− s
Upperclassmen 35 15.6 2.9
Underclassmen 35 12.3 4.1
b. Computetheobservedsignificanceofthetest.
18. Ankinesiologistclaimsthattherestingheartrateofmenaged18to25whoexerciseregularlyismore
thanfivebeatsperminutelessthanthatofmenwhodonotexerciseregularly.Menineachcategory
wereselectedatrandomandtheirrestingheartratesweremeasured,withtheresultsshown.
n x− s
Regular exercise 40 63 1.0
No regular exercise 30 71 1.2
a. Performtherelevanttestofhypothesesatthe1%levelofsignificance.
b. Computetheobservedsignificanceofthetest.
19. Childrenintwoelementaryschoolclassroomsweregiventwoversionsofthesametest,butwiththe
orderofquestionsarrangedfromeasiertomoredifficultinVersion AandinreverseorderinVersionB.
RandomlyselectedstudentsfromeachclassweregivenVersion AandtherestVersionB.Theresultsare
showninthetable.
n x− s
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 440/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
440
n x− s
Version A 31 83 4.6
Version B 32 78 4.3
a. Constructthe90%confidenceintervalforthedifferenceinthemeansofthepopulationsofall
childrentakingVersion AofsuchatestandofallchildrentakingVersionBofsuchatest.
b. Testatthe1%levelofsignificancethehypothesisthatthe Aversionofthetestiseasierthan
theBversion(eventhoughthequestionsarethesame).
c. Computetheobservedsignificanceofthetest.
20. TheMunicipalTransitAuthoritywantstoknowif,onweekdays,morepassengersridethenorthbound
bluelinetraintowardsthecitycenterthatdepartsat8:15a.m.ortheonethatdepartsat8:30a.m.The
followingsamplestatisticsareassembledbytheTransitAuthority.
n x− s
8:15 a.m. train 30 323 41
8:30 a.m. train 45 356 45
a. Constructthe90%confidenceintervalforthedifferenceinthemeannumberofdailytravellers
onthe8:15trainandthemeannumberofdailytravellersonthe8:30train.
b. Testatthe5%levelofsignificancewhetherthedataprovidesufficientevidencetoconcludethat
morepassengersridethe8:30train.c. Computetheobservedsignificanceofthetest.
21. Incomparingtheacademicperformanceofcollegestudentswhoareaffiliatedwithfraternitiesandthose
malestudentswhoareunaffiliated,arandomsampleofstudentswasdrawnfromeachofthetwo
populationsonauniversitycampus.SummarystatisticsonthestudentGPAsaregivenbelow.
n x− s
Fraternity 645 2.90 0.47
Unaffiliated 450 2.88 0.42
22. Test,atthe5%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethatthereis
adifferenceinaverageGPAbetweenthepopulationoffraternitystudentsandthepopulationof
unaffiliatedmalestudentsonthisuniversitycampus.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 441/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
441
23. Incomparingtheacademicperformanceofcollegestudentswhoareaffiliatedwithsororitiesandthose
femalestudentswhoareunaffiliated,arandomsampleofstudentswasdrawnfromeachofthetwo
populationsonauniversitycampus.SummarystatisticsonthestudentGPAsaregivenbelow.
n x− s
Sorority 330 3.18 0.37
Unaffiliated 550 3.12 0.41
24. Test,atthe5%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethatthereis
adifferenceinaverageGPAbetweenthepopulationofsororitystudentsandthepopulationof
unaffiliatedfemalestudentsonthisuniversitycampus.
25. Theownerofaprofessionalfootballteambelievesthattheleaguehasbecomemoreoffenseoriented
sincefiveyearsago.Tocheckhisbelief,32randomlyselectedgamesfromoneyear’sschedulewere
comparedto32randomlyselectedgamesfromtheschedulefiveyearslater.Sincemoreoffenseproduces
morepointspergame,theowneranalyzedthefollowinginformationonpointspergame(ppg).
n x− s
ppg previously 32 20.62 4.17
ppg recently 32 22.05 4.01
26. Test,atthe10%levelofsignificance,whetherthedataonpointspergameprovidesufficientevidenceto
concludethatthegamehasbecomemoreoffenseoriented.27. Theownerofaprofessionalfootballteambelievesthattheleaguehasbecomemoreoffenseoriented
sincefiveyearsago.Tocheckhisbelief,32randomlyselectedgamesfromoneyear’sschedulewere
comparedto32randomlyselectedgamesfromtheschedulefiveyearslater.Sincemoreoffenseproduces
moreoffensiveyardspergame,theowneranalyzedthefollowinginformationonoffensiveyardsper
game(oypg).
n x− s
oypg previously 32 316 40
oypg recently 32 336 35
28. Test,atthe10%levelofsignificance,whetherthedataonoffensiveyardspergameprovidesufficient
evidencetoconcludethatthegamehasbecomemoreoffenseoriented.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 442/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
442
L A R G E D A T A S E T E X E R C I S E S
25. LargeDataSets1Aand1BlisttheSATscoresfor1,000randomlyselectedstudents.Denotethepopulationof
allmalestudentsasPopulation1andthepopulationofallfemalestudentsasPopulation2.
http://www.flatworldknowledge.com/sites/all/files/data1A.xls
http://www.flatworldknowledge.com/sites/all/files/data1B.xls
a. Restrictingattentiontojustthemales,findn1, x−1,ands1.Restrictingattentiontojustthe
females,findn2, x−2,ands2.
b. Let µ1denotethemeanSATscoreforallmalesand µ2themeanSATscoreforallfemales.Usethe
resultsofpart(a)toconstructa90%confidenceintervalforthedifference µ1− µ2.
c. Test,atthe5%levelofsignificance,thehypothesisthatthemeanSATscoresamongmales
exceedsthatoffemales.26. LargeDataSets1Aand1BlisttheGPAsfor1,000randomlyselectedstudents.Denotethepopulationofall
malestudentsasPopulation1andthepopulationofallfemalestudentsasPopulation2.
http://www.flatworldknowledge.com/sites/all/files/data1A.xls
http://www.flatworldknowledge.com/sites/all/files/data1B.xls
a. Restrictingattentiontojustthemales,findn1, x−1,ands1.Restrictingattentiontojustthe
females,findn2, x−2,ands2.
b. Let µ1denotethemeanGPAforallmalesand µ2themeanGPAforallfemales.Usetheresultsof
part(a)toconstructa95%confidenceintervalforthedifference µ1− µ2.
c. Test,atthe10%levelofsignificance,thehypothesisthatthemeanGPAsamongmalesand
femalesdiffer.
27. LargeDataSets7Aand7Blistthesurvivaltimesfor65maleand75femalelaboratorymicewiththymic
leukemia.DenotethepopulationofallsuchmalemiceasPopulation1andthepopulationofallsuchfemale
miceasPopulation2.
http://www.flatworldknowledge.com/sites/all/files/data7A.xls
http://www.flatworldknowledge.com/sites/all/files/data7B.xls
a. Restrictingattentiontojustthemales,findn1, x−1,ands1.Restrictingattentiontojustthe
females,findn2, x−2,ands2.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 443/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
443
b. Let µ1denotethemeansurvivalforallmalesand µ2themeansurvivaltimeforallfemales.Use
theresultsofpart(a)toconstructa99%confidenceintervalforthedifference µ1− µ2.
c. Test,atthe1%levelofsignificance,thehypothesisthatthemeansurvivaltimeformalesexceeds
thatforfemalesbymorethan182days(halfayear).
d. Computetheobservedsignificanceofthetestinpart(c).
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 444/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 445/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
445
9.2ComparisonofTwoPopulationMeans:Small,Independent
SamplesL E A R N I N G O B J E C T I V E S
1. Tolearnhowtoconstructaconfidenceintervalforthedifferenceinthemeansoftwodistinctpopulations
usingsmall,independentsamples.
2. Tolearnhowtoperformatestofhypothesesconcerningthedifferencebetweenthemeansoftwo
distinctpopulationsusingsmall,independentsamples.
When one or the other of the sample sizes is small, as is often the case in practice, the Central Limit
Theorem does not apply. We must then impose conditions on the population to give statistical
validity to the test procedure. We will assume that both populations from which the samples are
taken have a normal probability distribution and that their standard deviations are equal.
ConfidenceIntervals
When the two populations are normally distributed and have equal standard deviations, the following
formula for a confidence interval for µ1− µ2 is valid.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 446/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
446
E X A M P L E 4
Asoftwarecompanymarketsanewcomputergamewithtwoexperimentalpackagingdesigns.
Design1issentto11stores;theiraveragesalesthefirstmonthis52unitswithsamplestandard
deviation12units.Design2issentto6stores;theiraveragesalesthefirstmonthis46unitswith
samplestandarddeviation10units.Constructapointestimateanda95%confidenceintervalforthe
differenceinaveragemonthlysalesbetweenthetwopackagedesigns.
Solution:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 447/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
447
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 448/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
448
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 449/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
449
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 450/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
450
Thedatadonotprovidesufficientevidence,atthe1%levelofsignificance,toconcludethatthemeansales
permonthofthetwodesignsaredifferent.
E X A M P L E 6
PerformthetestofNote9.13"Example5"usingthe p-valueapproach.
Solution:
ThefirstthreestepsareidenticaltothoseinNote9.13"Example5".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 451/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
451
• Step4.Becausethetestistwo-tailedtheobservedsignificanceor p-valueofthetestisthedouble
oftheareaoftherighttailofStudent’st -distribution,with15degreesoffreedom,thatiscutoff
bytheteststatisticT =1.040.Wecanonlyapproximatethisnumber.LookingintherowofFigure
12.3"CriticalValuesof" headeddf =15,thenumber1.040isbetweenthenumbers0.866and1.341,
correspondingtot 0.200andt 0.100.
Theareacutoffbyt =0.866is0.200andtheareacutoffbyt =1.341is0.100.Since1.040is
between0.866and1.341theareaitcutsoffisbetween0.200and0.100.Thusthe p-value(since
theareamustbedoubled)isbetween0.400and0.200.
• Step5.Since p>0.200>0.01, p>α,sothedecisionisnottorejectthenullhypothesis:
Thedatadonotprovidesufficientevidence,atthe1%levelofsignificance,toconcludethatthe
meansalespermonthofthetwodesignsaredifferent.
K E Y T A K E A W A Y S
• Inthecontextofestimatingortestinghypothesesconcerningtwopopulationmeans,“small”samples
meansthatatleastonesampleissmall.Inparticular,evenifonesampleisofsize30ormore,iftheother
isofsizelessthan30theformulasofthissectionmustbeused.
• Aconfidenceintervalforthedifferenceintwopopulationmeansiscomputedusingaformulainthesame
fashionaswasdoneforasinglepopulationmean.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 452/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
452
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 453/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
453
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 454/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
454
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 455/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
455
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 456/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
456
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 457/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
457
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 458/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
458
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 459/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
459
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 460/682
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 461/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
461
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 462/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
462
9.3ComparisonofTwoPopulationMeans:PairedSamplesL E A R N I N G O B J E C T I V E S
1. Tolearnthedistinctionbetweenindependentsamplesandpairedsamples.
2.
Tolearnhowtoconstructaconfidenceintervalforthedifferenceinthemeansoftwodistinctpopulations
usingpairedsamples.
3. Tolearnhowtoperformatestofhypothesesconcerningthedifferenceinthemeansoftwodistinct
populationsusingpairedsamples.
Suppose chemical engineers wish to compare the fuel economy obtained by two different
formulations of gasoline. Since fuel economy varies widely from car to car, if the mean fuel economy
of two independent samples of vehicles run on the two types of fuel were compared, even if one
formulation were better than the other the large variability from vehicle to vehicle might make any
difference arising from difference in fuel difficult to detect. Just imagine one random sample having
many more large vehicles than the other. Instead of independent random samples, it would make
more sense to select pairs of cars of the same make and model and driven under similar
circumstances, and compare the fuel economy of the two cars in each pair. Thus the data would look
something like Table 9.1 "Fuel Economy of Pairs of Vehicles", where the first car in each pair is
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 463/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
463
operated on one formulation of the fuel (call it Type 1 gasoline) and the second car is operated on the
second (call it Type 2 gasoline).
Table 9.1 Fuel Economy of Pairs of Vehicles
Make and Model Car 1 Car 2
Buick LaCrosse 17.0 17.0
Dodge Viper 13.2 12.9
Honda CR-Z 35.3 35.4
Hummer H 3 13.6 13.2
Lexus RX 32.7 32.5
Mazda CX-9 18.4 18.1
Saab 9-3 22.5 22.5
Toyota Corolla 26.8 26.7
Volvo XC 90 15.1 15.0
The first column of numbers form a sample from Population 1, the population of all cars operated on
Type 1 gasoline; the second column of numbers form a sample from Population 2, the population of
all cars operated on Type 2 gasoline. It would be incorrect to analyze the data using the formulas
from the previous section, however, since the samples were not drawn independently.
What is correct is to compute the difference in the numbers in each pair (subtracting in the same
order each time) to obtain the third column of numbers as shown in Table 9.2 "Fuel Economy of
Pairs of Vehicles" and treat the differences as the data. At this point, the new sample of
differences d 1=0.0,…,d 9=0.1 in the third column of Table 9.2 "Fuel Economy of Pairs of Vehicles" may
be considered as a random sample of size n = 9 selected from a population with mean µd = µ1− µ2. This
approach essentially transforms the paired two-sample problem into a one-sample problem as
discussed in the previous two chapters.
Table 9.2 Fuel Economy of Pairs of Vehicles
Make and Model Car 1 Car 2 Difference
Buick LaCrosse 17.0 17.0 0.0
Dodge Viper 13.2 12.9 0.3
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 464/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
464
Make and Model Car 1 Car 2 Difference
Honda CR-Z 35.3 35.4 −0.1
Hummer H 3 13.6 13.2 0.4
Lexus RX 32.7 32.5 0.2
Mazda CX-9 18.4 18.1 0.3
Saab 9-3 22.5 22.5 0.0
Toyota Corolla 26.8 26.7 0.1
Volvo XC 90 15.1 15.0 0.1
Note carefully that although it does not matter what order the subtraction is done, it must be done in
the same order for all pairs. This is why there are both positive and negative quantities in the third
column of numbers in Table 9.2 "Fuel Economy of Pairs of Vehicles".
ConfidenceIntervals
When the population of differences is normally distributed the following formula for a confidence interval
for µd = µ1− µ2 is valid.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 465/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
465
E X A M P L E 7
UsingthedatainTable9.1"FuelEconomyofPairsofVehicles" constructapointestimateanda95%
confidenceintervalforthedifferenceinaveragefueleconomybetweencarsoperatedonType1
gasolineandcarsoperatedonType2gasoline.
Solution:
Wehavereferredtothedatain Table9.1"FuelEconomyofPairsofVehicles" becausethatistheway
thatthedataaretypicallypresented,butweemphasizethatwithpairedsamplingoneimmediately
computesthedifferences,asgivenin Table9.2"FuelEconomyofPairsofVehicles" ,andusesthe
differencesasthedata.
Themeanandstandarddeviationofthedifferencesare
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 466/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
466
HypothesisTestingTesting hypotheses concerning the difference of two population means using paired difference
samples is done precisely as it is done for independent samples, although now the null and
alternative hypotheses are expressed in terms of µd instead of µ1− µ2. Thus the null hypothesis will
always be written
H 0: µd = D0
The three forms of the alternative hypothesis, with the terminology for each case, are:
Form of H a Terminology
H a: µd < D0 Left-tailed
H a: µd > D0 Right-tailed
H a: µd ≠ D0 Two-tailed
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 467/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
467
The same conditions on the population of differences that was required for constructing a confidence
interval for the difference of the means must also be met when hypotheses are tested. Here is the
standardized test statistic that is used in the test.
E X A M P L E 8 UsingthedataofTable9.2"FuelEconomyofPairsofVehicles" testthehypothesisthatmeanfuel
economyforType1gasolineisgreaterthanthatforType2gasolineagainstthenullhypothesisthat
thetwoformulationsofgasolineyieldthesamemeanfueleconomy.Testatthe5%levelof
significanceusingthecriticalvalueapproach.
Solution:
Theonlypartofthetablethatweuseisthethirdcolumn,thedifferences.
• Step1.Sincethedifferenceswerecomputedintheorder
Type
1 mpg
− Type
2 mpg,betterfuel
economywithType1fuelcorrespondsto µd = µ1− µ2>0.Thusthetestis
H 0: µd = 0
vs. H a: µd >0 @ α=0.05
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 468/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
468
(Ifthedifferenceshadbeencomputedintheoppositeorderthenthealternativehypotheses
wouldhavebeen H a: µd <0.)
Figure9.5RejectionRegionandTestStatisticforNote9.20"Example8"
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 469/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
469
Thedataprovidesufficientevidence,atthe5%levelofsignificance,toconcludethatthemeanfuel
economyprovidedbyType1gasolineisgreaterthanthatforType2gasoline.
E X A M P L E 9
PerformthetestofNote9.20"Example8"usingthe p-valueapproach.
Solution:
Thefirstthreestepsareidenticaltothosein Note9.20"Example8".
• Step4.Becausethetestisone-tailedtheobservedsignificanceor p-valueofthetestisjustthe
areaoftherighttailofStudent’st -distribution,with8degreesoffreedom,thatiscutoffbythe
teststatisticT =2.600.Wecanonlyapproximatethisnumber.LookingintherowofFigure12.3
"CriticalValuesof" headeddf =8,thenumber2.600isbetweenthenumbers2.306and2.896,
correspondingtot 0.025andt 0.010.
Theareacutoffbyt =2.306is0.025andtheareacutoffbyt =2.896is0.010.Since2.600is
between2.306and2.896theareaitcutsoffisbetween0.025and0.010.Thusthe p-valueis
between0.025and0.010.Inparticularitislessthan0.025.SeeFigure9.6.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 470/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
470
Figure9.6P-ValueforNote9.21"Example9"
• Step5.Since0.025<0.05, p<αsothedecisionistorejectthenullhypothesis:
Thedataprovidesufficientevidence,atthe5%levelofsignificance,toconcludethatthemeanfuel
economyprovidedbyType1gasolineisgreaterthanthatforType2gasoline.
The paired two-sample experiment is a very powerful study design. It bypasses many unwanted
sources of “statistical noise” that might otherwise influence the outcome of the experiment, and
focuses on the possible difference that might arise from the one factor of interest.
If the sample is large (meaning that n ≥ 30) then in the formula for the confidence interval we may
replace t α/2 by z α/2. For hypothesis testing when the number of pairs is at least 30, we may use the same
statistic as for small samples for hypothesis testing, except now it follows a standard normal
distribution, so we use the last line of Figure 12.3 "Critical Values of " to compute critical values,
and p-values can be computed exactly with Figure 12.2 "Cumulative Normal Probability", not merely
estimated using Figure 12.3 "Critical Values of ".
K E Y T A K E A W A Y S
• Whenthedataarecollectedinpairs,thedifferencescomputedforeachpairarethedatathatareusedintheformulas.
• Aconfidenceintervalforthedifferenceintwopopulationmeansusingpairedsamplingiscomputedusing
aformulainthesamefashionaswasdoneforasinglepopulationmean.
•
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 471/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
471
• Thesamefive-stepprocedureusedtotesthypothesesconcerningasinglepopulationmeanisusedtotest
hypothesesconcerningthedifferencebetweentwopopulationmeansusingpairsampling.Theonly
differenceisintheformulaforthestandardizedteststatistic.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 472/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
472
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 473/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
473
House County Government Private Company
1 217 219
2 350 338
3 296 291
4 237 237
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 474/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
474
House County Government Private Company
5 237 235
6 272 269
7 257 239
8 277 275
9 312 320
10 335 335
a. Giveapointestimateforthedifferencebetweenthemeanprivateappraisalofallsuchhomes
andthegovernmentappraisalofallsuchhomes.
b. Constructthe99%confidenceintervalbasedonthesedataforthedifference.
c. Test,atthe1%levelofsignificance,thehypothesisthatappraisedvaluesbythecounty
governmentofallsuchhousesisgreaterthantheappraisedvaluesbytheprivateappraisal
company.
8. Inordertocutcostsawineproducerisconsideringusingduoor1+1corksinplaceoffullnaturalwood
corks,butisconcernedthatitcouldaffectbuyers’sperceptionofthequalityofthewine.Thewine
producershippedeightpairsofbottlesofitsbestyoungwinestoeightwineexperts.Eachpairincludes
onebottlewithanaturalwoodcorkandonewithaduocork.Theexpertsareaskedtoratethewinesona
onetotenscale,highernumberscorrespondingtohigherquality.Theresultsare:
Wine Expert Duo Cork Wood Cork
1 8.5 8.5
2 8.0 8.5
3 6.5 8.0
4 7.5 8.5
5 8.0 7.5
6 8.0 8.0
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 475/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
475
Wine Expert Duo Cork Wood Cork
7 9.0 9.0
8 7.0 7.5
a. Giveapointestimateforthedifferencebetweenthemeanratingsofthewinewhenbottledare
sealedwithdifferentkindsofcorks.
b. Constructthe90%confidenceintervalbasedonthesedataforthedifference.
c. Test,atthe10%levelofsignificance,thehypothesisthatontheaverageduocorksdecreasethe
ratingofthewine.
9. Engineersatatiremanufacturingcorporationwishtotestanewtirematerialforincreaseddurability.Totest
thetiresunderrealisticroadconditions,newfronttiresaremountedoneachof11companycars,onetire
madewithaproductionmaterialandtheotherwiththeexperimentalmaterial.Afterafixedperiodthe11
pairsweremeasuredforwear.Theamountofwearforeachtire(inmm)isshowninthetable:
Car Production Experimental
1 5.1 5.0
2 6.5 6.5
3 3.6 3.1
4 3.5 3.7
5 5.7 4.5
6 5.0 4.1
7 6.4 5.3
8 4.7 2.6
9 3.2 3.0
10 3.5 3.5
11 6.4 5.1
a. Giveapointestimateforthedifferenceinmeanwear.
b. Constructthe99%confidenceintervalforthedifferencebasedonthesedata.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 476/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
476
c. Test,atthe1%levelofsignificance,thehypothesisthatthemeanwearwiththeexperimental
materialislessthanthatfortheproductionmaterial.
10. Amarriagecounseloradministeredatestdesignedtomeasureoverallcontentmentto30randomlyselected
marriedcouples.Thescoresforeachcouplearegivenbelow.Ahighernumbercorrespondstogreater
contentmentorhappiness.
Couple Husband Wife
1 47 44
2 44 46
3 49 44
4 53 44
5 42 43
6 45 45
7 48 47
8 45 44
9 52 44
10 47 42
11 40 34
12 45 42
13 40 43
14 46 41
15 47 45
16 46 45
17 46 41
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 477/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
477
Couple Husband Wife
18 46 41
19 44 45
20 45 43
21 48 38
22 42 46
23 50 44
24 46 51
25 43 45
26 50 40
27 46 46
28 42 41
29 51 41
30 46 47
a. Test,atthe1%levelofsignificance,thehypothesisthatonaveragemenandwomenarenot
equallyhappyinmarriage.
b. Test,atthe1%levelofsignificance,thehypothesisthatonaveragemenarehappierthan
womeninmarriage.
L A R G E D A T A S E T E X E R C I S E S
11. LargeDataSet5liststhescoresfor25randomlyselectedstudentsonpracticeSATreadingtestsbeforeand
aftertakingatwo-weekSATpreparationcourse.Denotethepopulationofallstudentswhohavetakenthe
courseasPopulation1andthepopulationofallstudentswhohavenottakenthecourseasPopulation2.
http://www.flatworldknowledge.com/sites/all/files/data5.xls
a. Computethe25differencesintheorder after − before,theirmeand −,andtheirsamplestandard
deviationsd .
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 478/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
478
b. Giveapointestimatefor µd = µ1− µ2,thedifferenceinthemeanscoreofallstudentswhohavetaken
thecourseandthemeanscoreofallwhohavenot.
c. Constructa98%confidenceintervalfor µd .
d. Test,atthe1%levelofsignificance,thehypothesisthatthemeanSATscoreincreasesbyatleast
tenpointsbytakingthetwo-weekpreparationcourse.
12. LargeDataSet12liststhescoresononeroundfor75randomlyselectedmembersatagolfcourse,firstusing
theirownoriginalclubs,thentwomonthslaterafterusingnewclubswithanexperimentaldesign.Denote
thepopulationofallgolfersusingtheirownoriginalclubsasPopulation1andthepopulationofallgolfers
usingthenewstyleclubsasPopulation2.
http://www.flatworldknowledge.com/sites/all/files/data12.xls
a. Computethe75differencesintheorder original clubs− new clubs,theirmeand −,andtheirsample
standarddeviationsd .
b. Giveapointestimatefor µd = µ1− µ2,thedifferenceinthemeanscoreofallgolfersusingtheir
originalclubsandthemeanscoreofallgolfersusingthenewkindofclubs.
c. Constructa90%confidenceintervalfor µd .
d. Test,atthe1%levelofsignificance,thehypothesisthatthemeangolfscoredecreasesbyatleast
onestrokebyusingthenewkindofclubs.
13. Considerthepreviousproblemagain.Sincethedatasetissolarge,itisreasonabletousethestandard
normaldistributioninsteadofStudent’st -distributionwith74degreesoffreedom.
a. Constructa90%confidenceintervalfor µd usingthestandardnormaldistribution,meaningthat
theformulaisd −± z α/2 sdn−−√.(Thecomputationsdoneinpart(a)ofthepreviousproblemstillapply
andneednotberedone.)Howdoestheresultobtainedherecomparetotheresultobtainedin
part(c)ofthepreviousproblem?
b. Test,atthe1%levelofsignificance,thehypothesisthatthemeangolfscoredecreasesbyatleast
onestrokebyusingthenewkindofclubs,usingthestandardnormaldistribution.(Allthework
doneinpart(d)ofthepreviousproblemapplies,exceptthecriticalvalueisnow z αinstead
oft α(orthe p-valuecanbecomputedexactlyinsteadofonlyapproximated,ifyouusedthe p-
valueapproach).)Howdoestheresultobtainedherecomparetotheresultobtainedinpart(c)of
thepreviousproblem?
c. Constructthe99%confidenceintervalsfor µd usingboththet-and z-distributions.Howmuch
differenceisthereintheresultsnow?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 479/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
479
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 480/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
480
9.4ComparisonofTwoPopulationProportionsL E A R N I N G O B J E C T I V E S
1. Tolearnhowtoconstructaconfidenceintervalforthedifferenceintheproportionsoftwodistinct
populationsthathaveaparticularcharacteristicofinterest.
2. Tolearnhowtoperformatestofhypothesesconcerningthedifferenceintheproportionsoftwodistinct
populationsthathaveaparticularcharacteristicofinterest.
Suppose we wish to compare the proportions of two populations that have a specific characteristic,
such as the proportion of men who are left-handed compared to the proportion of women who are
left-handed. Figure 9.7 "Independent Sampling from Two Populations In Order to Compare
Proportions" illustrates the conceptual framework of our investigation. Each population is divided
into two groups, the group of elements that have the characteristic of interest (for example, being
left-handed) and the group of elements that do not. We arbitrarily label one population as
Population 1 and the other as Population 2, and subscript the proportion of each population that
possesses the characteristic with the number 1 or 2 to tell them apart. We draw a random sample
from Population 1 and label the sample statistic it yields with the subscript 1. Without reference to
the first sample we draw a sample from Population 2 and label its sample statistic with the subscript
2.
Figure 9.7 Independent Sampling from Two Populations In Order to Compare Proportions
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 481/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
481
Our goal is to use the information in the samples to estimate the difference p1− p2 in the
two population proportions and to make statistically valid inferences about it.
ConfidenceIntervals
Since the sample proportion pˆ1 computed using the sample drawn from Population 1 is a good estimator
of population proportion p1 of Population 1 and the sample proportion pˆ2 computed using the sample
drawn from Population 2 is a good estimator of population proportion p2 of Population 2, a reasonable
point estimate of the difference p1− p2 is pˆ1− pˆ2. In order to widen this point estimate into a confidence
interval we suppose that both samples are large, as described in Section 7.3 "Large Sample Estimation of a
Population Proportion" in Chapter 7 "Estimation" and repeated below. If so, then the following formula
for a confidence interval for p1− p2 is valid.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 482/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
482
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 483/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
483
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 484/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
484
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 485/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
485
The three forms of the alternative hypothesis, with the terminology for each case, are:
Form of H a Terminology
H a: p1− p2< D0 Left-tailed
H a: p1− p2> D0 Right-tailed
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 486/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
486
Form of H a Terminology
H a: p1− p2≠ D0 Two-tailed
As long as the samples are independent and both are large the following formula for the standardized
test statistic is valid, and it has the standard normal distribution.
E X A M P L E 1 1
UsingthedataofNote9.25"Example10",testwhetherthereissufficientevidencetoconcludethat
publicwebaccesstotheinspectionrecordshasincreasedtheproportionofprojectsthatpassedon
thefirstinspectionbymorethan5percentagepoints.Usethecriticalvalueapproachatthe10%level
ofsignificance.
Solution:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 487/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
487
• Step1.Takingintoaccountthelabelingofthepopulationsanincreaseinpassingrateatthefirst
inspectionbymorethan5percentagepointsafterpublicaccessonthewebmaybeexpressed
as p2> p1+0.05,whichbyalgebraisthesameas p1− p2<−0.05.Thisisthealternativehypothesis.Sincethe
nullhypothesisisalwaysexpressedasanequality,withthesamenumberontherightasisinthe
alternativehypothesis,thetestis
• Thedataprovidesufficientevidence,atthe10%levelofsignificance,toconcludethattherateof
passingonthefirstinspectionhasincreasedbymorethan5percentagepointssincerecordswere
publiclypostedontheweb.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 488/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
488
Figure9.8RejectionRegionandTestStatisticforNote9.27"Example11"
E X A M P L E 1 2
PerformthetestofNote9.27"Example11"usingthe p-valueapproach.
Solution:
ThefirstthreestepsareidenticaltothoseinNote9.27"Example11".
• Step4.Becausethetestisleft-tailedtheobservedsignificanceor p-valueofthetestisjusttheareaofthe
lefttailofthestandardnormaldistributionthatiscutoffbytheteststatistic Z =−1.770.FromFigure12.2
"CumulativeNormalProbability" theareaofthelefttaildeterminedby−1.77is0.0384.The p-valueis
0.0384.
• Step5.Sincethe p-value0.0384islessthan α=0.10,thedecisionistorejectthenullhypothesis:Thedata
providesufficientevidence,atthe10%levelofsignificance,toconcludethattherateofpassingonthe
firstinspectionhasincreasedbymorethan5percentagepointssincerecordswerepubliclypostedonthe
web.
Finally a common misuse of the formulas given in this section must be mentioned. Suppose a large
pre-election survey of potential voters is conducted. Each person surveyed is asked to express a
preference between, say, Candidate A and Candidate B. (Perhaps “no preference” or “other” are also
choices, but that is not important.) In such a survey, estimators p ̂A and p ̂B of p Aand p B can be
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 489/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
489
calculated. It is important to realize, however, that these two estimators were not calculated from two
independent samples. While p ̂A− p ̂B may be a reasonable estimator of p A− p B, the formulas for
confidence intervals and for the standardized test statistic given in this section are not valid for data
obtained in this manner.
K E Y T A K E A W A Y S
• Aconfidenceintervalforthedifferenceintwopopulationproportionsiscomputedusingaformulainthe
samefashionaswasdoneforasinglepopulationmean.
• Thesamefive-stepprocedureusedtotesthypothesesconcerningasinglepopulationproportionisused
totesthypothesesconcerningthedifferencebetweentwopopulationproportions.Theonlydifferenceis
intheformulaforthestandardizedteststatistic.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 490/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
490
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 491/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
491
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 492/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
492
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 493/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
493
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 494/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
494
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 495/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
495
b. Test H 0: p1− p2=0.30vs. H a: p1− p2≠0.30@α=0.10,
n1=7500, pˆ1=0.664
n2=1000, pˆ2=0.319
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 496/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
496
A P P L I C A T I O N S
Inalltheremainingexercsisesthesamplesaresufficientlylarge(sothisneednotbechecked).
13. Votersinaparticularcitywhoidentifythemselveswithoneortheotheroftwopoliticalpartieswere
randomlyselectedandaskediftheyfavoraproposaltoallowcitizenswithproperlicensetocarrya
concealedhandgunincityparks.Theresultsare:
Party A Party B
Sample size, n 150 200
Number in favor, x 90 140
a. GiveapointestimateforthedifferenceintheproportionofallmembersofPartyAandall
membersofPartyBwhofavortheproposal.b. Constructthe95%confidenceintervalforthedifference,basedonthesedata.
c. Test,atthe5%levelofsignificance,thehypothesisthattheproportionofallmembersofPartyA
whofavortheproposalislessthantheproportionofallmembersofPartyBwhodo.
d. Computethe p-valueofthetest.
14. Toinvestigateapossiblerelationbetweengenderandhandedness,arandomsampleof320adultswas
taken,withthefollowingresults:
Men Women
Sample size, n 168 152
Number of left-handed, x 24 9
a. Giveapointestimateforthedifferenceintheproportionofallmenwhoareleft-handedandthe
proportionofallwomenwhoareleft-handed.
b. Constructthe95%confidenceintervalforthedifference,basedonthesedata.
c. Test,atthe5%levelofsignificance,thehypothesisthattheproportionofmenwhoareleft-
handedisgreaterthantheproportionofwomenwhoare.
d. Computethe p-valueofthetest.
15. Alocalschoolboardmemberrandomlysampledprivateandpublichighschoolteachersinhisdistrictto
comparetheproportionsofNationalBoardCertified(NBC)teachersinthefaculty.Theresultswere:
Private Schools Public Schools
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 497/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
497
Private Schools Public Schools
Sample size, n 80 520
Proportion of NBC teachers, pˆ 0.175 0.150
a. Giveapointestimateforthedifferenceintheproportionofallteachersinareapublicschools
andtheproportionofallteachersinprivateschoolswhoareNationalBoardCertified.
b. Constructthe90%confidenceintervalforthedifference,basedonthesedata.
c. Test,atthe10%levelofsignificance,thehypothesisthattheproportionofallpublicschool
teacherswhoareNationalBoardCertifiedislessthantheproportionofprivateschoolteachers
whoare.
d. Computethe p-valueofthetest.
16. Inprofessionalbasketballgames,thefansofthehometeamalwaystrytodistractfreethrowshootersonthe
visitingteam.Toinvestigatewhetherthistacticisactuallyeffective,thefreethrowstatisticsofaprofessional
basketballplayerwithahighfreethrowpercentagewereexamined.Duringtheentirelastseason,thisplayer
had656freethrows,420inhomegamesand236inawaygames.Theresultsaresummarizedbelow.
Home Away
Sample size, n 420 236
Free throw percent, pˆ 81.5% 78.8%
a. Giveapointestimateforthedifferenceintheproportionoffreethrowsmadeathomeandaway.
b. Constructthe90%confidenceintervalforthedifference,basedonthesedata.
c. \Test,atthe10%levelofsignificance,thehypothesisthatthereexistsahomeadvantageinfree
throws.
d. Computethe p-valueofthetest.
17. Randomlyselectedmiddle-agedpeopleinbothChinaandtheUnitedStateswereaskediftheybelievedthat
adultshaveanobligationtofinanciallysupporttheiragedparents.Theresultsaresummarizedbelow.
China USA
Sample size, n 1300 150
Number of yes, x 1170 110
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 498/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
498
Test,atthe1%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethatthere
existsaculturaldifferenceinattituderegardingthisquestion.
18. Amanufacturerofwalk-behindpushmowersreceivesrefurbishedsmallenginesfromtwonew
suppliers, AandB.Itisnotuncommonthatsomeoftherefurbishedenginesneedtobelightlyserviced
beforetheycanbefittedintomowers.Themowermanufacturerrecentlyreceived100enginesfromeach
supplier.Intheshipmentfrom A,13neededfurtherservice.IntheshipmentfromB,10neededfurther
service.Test,atthe10%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethat
thereexistsadifferenceintheproportionsofenginesfromthetwosuppliersneedingservice.
L A R G E D A T A S E T E X E R C I S E S
19. LargeDataSets6Aand6Brecordresultsofarandomsurveyof200votersineachoftworegions,inwhich
theywereaskedtoexpresswhethertheypreferCandidate AforaU.S.Senateseatorprefersomeother
candidate.Letthepopulationofallvotersinregion1bedenotedPopulation1andthepopulationofall
votersinregion2bedenotedPopulation2.Let p1betheproportionofvotersinPopulation1whopreferCandidate A,and p2theproportioninPopulation2whodo.
http://www.flatworldknowledge.com/sites/all/files/data6A.xls
http://www.flatworldknowledge.com/sites/all/files/data6B.xls
a. Findtherelevantsampleproportions pˆ1and pˆ2.
b. Constructapointestimatefor p1− p2.
c. Constructa95%confidenceintervalfor p1− p2.
d. Test,atthe5%levelofsignificance,thehypothesisthatthesameproportionofvotersinthetwo
regionsfavorCandidate A,againstthealternativethatalargerproportioninPopulation2do.
20. LargeDataSet11recordstheresultsofsamplesofrealestatesalesinacertainregionintheyear2008(lines
2through536)andintheyear2010(lines537through1106).Foreclosuresalesareidentifiedwitha1inthe
secondcolumn.Letallrealestatesalesintheregionin2008bePopulation1andallrealestatesalesinthe
regionin2010bePopulation2.
http://www.flatworldknowledge.com/sites/all/files/data11.xls
a. Usethesampledatatoconstructpointestimates pˆ1and pˆ2oftheproportions p1and p2ofallreal
estatesalesinthisregionin2008and2010thatwereforeclosuresales.Constructapoint
estimateof p1− p2.
b. Usethesampledatatoconstructa90%confidencefor p1− p2.
c. Test,atthe10%levelofsignificance,thehypothesisthattheproportionofrealestatesalesin
theregionin2010thatwereforeclosuresaleswasgreaterthantheproportionofrealestate
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 499/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
499
salesintheregionin2008thatwereforeclosuresales.(Thedefaultisthattheproportionswere
thesame.)
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 500/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
500
9.5SampleSizeConsiderations
L E A R N I N G O B J E C T I V E
1. Tolearnhowtoapplyformulasforestimatingthesizesamplesthatwillbeneededinordertoconstructa
confidenceintervalforthedifferenceintwopopulationmeansorproportionsthatmeetsgivencriteria.
As was pointed out at the beginning of Section 7.4 "Sample Size Considerations"in Chapter 7
"Estimation", sampling is typically done with definite objectives in mind. For example, a physician
might wish to estimate the difference in the average amount of sleep gotten by patients suffering a
certain condition with the average amount of sleep got by healthy adults, at 90% confidence and to
within half an hour. Since sampling costs time, effort, and money, it would be useful to be able to
estimate the smallest size samples that are likely to meet these criteria.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 501/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
501
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 502/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
502
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 503/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
503
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 504/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
504
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 505/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
505
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 506/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
506
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 507/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
507
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 508/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
508
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 509/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
509
K E Y T A K E A W A Y S
• Ifthepopulationstandarddeviationsσ 1andσ 2areknownorcanbeestimated,thentheminimumequal
sizesofindependentsamplesneededtoobtainaconfidenceintervalforthedifference µ1− µ2intwo
populationmeanswithagivenmaximumerroroftheestimateE andagivenlevelofconfidencecanbe
estimated.
• Ifthestandarddeviationσ d ofthepopulationofdifferencesinpairsdrawnfromtwopopulationsisknown
orcanbeestimated,thentheminimumnumberofsamplepairsneededunderpaireddifferencesampling
toobtainaconfidenceintervalforthedifference µd = µ1− µ2intwopopulationmeanswithagivenmaximum
erroroftheestimateE andagivenlevelofconfidencecanbeestimated.
• Theminimumequalsamplesizesneededtoobtainaconfidenceintervalforthedifferenceintwo
populationproportionswithagivenmaximumerroroftheestimateandagivenlevelofconfidencecan
alwaysbeestimated.Ifthereispriorknowledgeofthepopulationproportions p1and p2thentheestimate
canbesharpened.
E X E R C I S E S
B A S I C
1. Estimatethecommonsamplesizenofequallysizedindependentsamplesneededtoestimate µ1− µ2as
specifiedwhenthepopulationstandarddeviationsareasshown.
a. 90%confidence,towithin3units,σ 1=10andσ 2=7
b. 99%confidence,towithin4units,σ 1=6.8andσ 2=9.3
c. 95%confidence,towithin5units,σ 1=22.6andσ 2=31.8
2. Estimatethecommonsamplesizenofequallysizedindependentsamplesneededtoestimate µ1− µ2as
specifiedwhenthepopulationstandarddeviationsareasshown.
a. 80%confidence,towithin2units,σ 1=14andσ 2=23
b. 90%confidence,towithin0.3units,σ 1=1.3andσ 2=0.8
c. 99%confidence,towithin11units,σ 1=42andσ 2=37
3. Estimatethenumbernofpairsthatmustbesampledinordertoestimate µd = µ1− µ2asspecifiedwhenthe
standarddeviationsd ofthepopulationofdifferencesisasshown.
a. 80%confidence,towithin6units,σ d =26.5
b. 95%confidence,towithin4units,σ d =12
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 510/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
510
c. 90%confidence,towithin5.2units,σ d =11.3
4. Estimatethenumbernofpairsthatmustbesampledinordertoestimate µd = µ1− µ2asspecifiedwhenthe
standarddeviationsd ofthepopulationofdifferencesisasshown.
a. 90%confidence,towithin20units,σ d =75.5
b. 95%confidence,towithin11units,σ d =31.4
c. 99%confidence,towithin1.8units,σ d =4
5. Estimatetheminimumequalsamplesizesn1=n2necessaryinordertoestimate p1− p2asspecified.
a. 80%confidence,towithin0.05(fivepercentagepoints)
1. whennopriorknowledgeof p1or p2isavailable
2. whenpriorstudiesindicatethat p1≈0.20and p2≈0.65
b. 90%confidence,towithin0.02(twopercentagepoints)
1. whennopriorknowledgeof p1or p2isavailable
2. whenpriorstudiesindicatethat p1≈0.75and p2≈0.63
c. 95%confidence,towithin0.10(tenpercentagepoints)
1. whennopriorknowledgeof p1or p2isavailable
2. whenpriorstudiesindicatethat p1≈0.11and p2≈0.37
6. Estimatetheminimumequalsamplesizesn1=n2necessaryinordertoestimate p1− p2asspecified.
a.
80%confidence,towithin0.02(twopercentagepoints)a. whennopriorknowledgeof p1or p2isavailable
b. whenpriorstudiesindicatethat p1≈0.78and p2≈0.65
b. 90%confidence,towithin0.05(twopercentagepoints)
a. whennopriorknowledgeof p1or p2isavailable
b. whenpriorstudiesindicatethat p1≈0.12and p2≈0.24
c. 95%confidence,towithin0.10(tenpercentagepoints)
a. whennopriorknowledgeof p1or p2isavailable
b. whenpriorstudiesindicatethat p1≈0.14and p2≈0.21
A P P L I C A T I O N S
7. Aneducationalresearcherwishestoestimatethedifferenceinaveragescoresofelementaryschoolchildren
ontwoversionsofa100-pointstandardizedtest,at99%confidenceandtowithintwopoints.Estimatethe
minimumequalsamplesizesnecessaryifitisknownthatthestandarddeviationofscoresondifferent
versionsofsuchtestsis4.9.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 511/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
511
8. Auniversityadministratorwishestoestimatethedifferenceinmeangradepointaveragesamongallmen
affiliatedwithfraternitiesandallunaffiliatedmen,with95%confidenceandtowithin0.15.Itisknownfrom
priorstudiesthatthestandarddeviationsofgradepointaveragesinthetwogroupshavecommonvalue0.4.
Estimatetheminimumequalsamplesizesnecessarytomeetthesecriteria.
9. Anautomotivetiremanufacturerwishestoestimatethedifferenceinmeanwearoftiresmanufacturedwith
anexperimentalmaterialandordinaryproductiontire,with90%confidenceandtowithin0.5mm.To
eliminateextraneousfactorsarisingfromdifferentdrivingconditionsthetireswillbetestedinpairsonthe
samevehicles.Itisknownfrompriorstudiesthatthestandarddeviationsofthedifferencesofwearoftires
constructedwiththetwokindsofmaterialsis1.75mm.Estimatetheminimumnumberofpairsinthesample
necessarytomeetthesecriteria.
10. Toassesstotherelativehappinessofmenandwomenintheirmarriages,amarriagecounselorplansto
administeratestmeasuringhappinessinmarriagetonrandomlyselectedmarriedcouples,recordthetheir
testscores,findthedifferences,andthendrawinferencesonthepossibledifference.Let µ1and µ2bethetrue
averagelevelsofhappinessinmarriageformenandwomenrespectivelyasmeasuredbythistest.Supposeit
isdesiredtofinda90%confidenceintervalforestimating µd = µ1− µ2towithintwotestpoints.Supposefurther
that,frompriorstudies,itisknownthatthestandarddeviationofthedifferencesintestscoresisσ d ≈10.What
istheminimumnumberofmarriedcouplesthatmustbeincludedinthisstudy?
11. Ajournalistplanstointerviewanequalnumberofmembersoftwopoliticalpartiestocomparethe
proportionsineachpartywhofavoraproposaltoallowcitizenswithaproperlicensetocarryaconcealed
handguninpublicparks.Let p1and p2bethetrueproportionsofmembersofthetwopartieswhoarein
favoroftheproposal.Supposeitisdesiredtofinda95%confidenceintervalforestimating p1− p2towithin
0.05.Estimatetheminimumequalnumberofmembersofeachpartythatmustbesampledtomeetthese
criteria.
12. AmemberofthestateboardofeducationwantstocomparetheproportionsofNationalBoardCertified
(NBC)teachersinprivatehighschoolsandinpublichighschoolsinthestate.Hisstudyplancallsforanequal
numberofprivateschoolteachersandpublicschoolteacherstobeincludedinthestudy.Let p1and p2be
theseproportions.Supposeitisdesiredtofinda99%confidenceintervalthatestimates p1− p2towithin0.05.
a. Supposingthatbothproportionsareknown,fromapriorstudy,tobeapproximately0.15,
computetheminimumcommonsamplesizeneeded.
b. Computetheminimumcommonsamplesizeneededonthesuppositionthatnothingisknown
aboutthevaluesof p1and p2.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 512/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
512
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 513/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
513
Chapter10
CorrelationandRegression
Our interest in this chapter is in situations in which we can associate to each element of a population
or sample two measurements x and y, particularly in the case that it is of interest to use the value
of x to predict the value of y. For example, the population could be the air in automobile
garages, x could be the electrical current produced by an electrochemical reaction taking place in a
carbon monoxide meter, and y the concentration of carbon monoxide in the air. In this chapter we
will learn statistical methods for analyzing the relationship between variables x and y in this context.
A list of all the formulas that appear anywhere in this chapter are collected in the last section for ease
of reference.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 514/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
514
10.1LinearRelationshipsBetweenVariables
L E A R N I N G O B J E C T I V E
1. Tolearnwhatitmeansfortwovariablestoexhibitarelationshipthatisclosetolinearbutwhichcontains
anelementofrandomness.
The following table gives examples of the kinds of pairs of variables which could be of interest from a
statistical point of view.
x y
Predictororindependentvariable Responseordependentvariable
TemperatureindegreesCelsius TemperatureindegreesFahrenheit
Areaofahouse(sq.ft.) Valueofthehouse
Ageofaparticularmakeandmodelcar Resalevalueofthecar
Amountspentbyabusinessonadvertisinginayear Revenuereceivedthatyear
Heightofa25-year-oldman Weightoftheman
The first line in the table is different from all the rest because in that case and no other the
relationship between the variables is deterministic: once the value of x is known the value of y is
completely determined. In fact there is a formula for y in terms of x : y=95 x+32. Choosing several values
for x and computing the corresponding value for y for each one using the formula gives the table
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 515/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
515
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 516/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
516
Figure 10.1 Plot of Celsius and Fahrenheit Temperature Pairs
The relationship between x and y in the temperature example is deterministic because once the value
of x is known, the value of y is completely determined. In contrast, all the other relationships listed in
the table above have an element of randomness in them. Consider the relationship described in the
last line of the table, the height x of a man aged 25 and his weight y. If we were to randomly select
several 25-year-old men and measure the height and weight of each one, we might obtain a collection
of ( x, y) pairs something like this:
(68,151) (69,146) (70,157) (70,164) (71,171) (72,160)
(72,163)(72,180)(73,170)(73,175)(74,178)(75,188)
A plot of these data is shown in Figure 10.2 "Plot of Height and Weight Pairs". Such a plot is called
a scatter diagram or scatter plot. Looking at the plot it is evident that there exists a linear
relationship between height x and weight y, but not a perfect one. The points appear to be following a
line, but not exactly. There is an element of randomness present.
Figure 10.2 Plot of Height and Weight Pairs
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 517/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
517
In this chapter we will analyze situations in which variables x and y exhibit such a linear relationship
with randomness. The level of randomness will vary from situation to situation. In the introductory
example connecting an electric current and the level of carbon monoxide in air, the relationship is
almost perfect. In other situations, such as the height and weights of individuals, the connection
between the two variables involves a high degree of randomness. In the next section we will see how
to quantify the strength of the linear relationship between two variables.
K E Y T A K E A W A Y S
• Twovariables x andy haveadeterministiclinearrelationshipifpointsplottedfrom( x, y)pairslieexactly
alongasinglestraightline.
• Inpracticeitiscommonfortwovariablestoexhibitarelationshipthatisclosetolinearbutwhichcontains
anelement,possiblylarge,ofrandomness.
E X E R C I S E S
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 518/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
518
B A S I C
1. Alinehasequation y=0.5 x+2.
a. Pickfivedistinct x -values,usetheequationtocomputethecorrespondingy -values,andplotthe
fivepointsobtained.
b. Givethevalueoftheslopeoftheline;givethevalueofthey -intercept.
2. Alinehasequation y= x−0.5.
a. Pickfivedistinct x -values,usetheequationtocomputethecorrespondingy -values,andplotthe
fivepointsobtained.
b. Givethevalueoftheslopeoftheline;givethevalueofthey -intercept.
3. Alinehasequation y=−2 x+4.
a. Pickfivedistinct x -values,usetheequationtocomputethecorrespondingy -values,andplotthe
fivepointsobtained.
b.
Givethevalueoftheslopeoftheline;givethevalueofthey -intercept.4. Alinehasequation y=−1.5 x+1.
a. Pickfivedistinct x -values,usetheequationtocomputethecorrespondingy -values,andplotthe
fivepointsobtained.
b. Givethevalueoftheslopeoftheline;givethevalueofthey -intercept.
5. Basedontheinformationgivenaboutaline,determinehowy willchange(increase,decrease,orstaythe
same)when x isincreased,andexplain.Insomecasesitmightbeimpossibletotellfromtheinformation
given.
a. Theslopeispositive.
b. They -interceptispositive.
c. Theslopeiszero.
6. Basedontheinformationgivenaboutaline,determinehowy willchange(increase,decrease,orstaythe
same)when x isincreased,andexplain.Insomecasesitmightbeimpossibletotellfromtheinformation
given.
a. They -interceptisnegative.
b. They -interceptiszero.
c. Theslopeisnegative.
7. Adatasetconsistsofeight( x, y)pairsofnumbers:
(0,12)(2,15)(4,16)(5,14)(8,22)(13,24)(15,28)(20,30)
a. Plotthedatainascatterdiagram.
b. Basedontheplot,explainwhethertherelationshipbetween x andy appearstobedeterministic
ortoinvolverandomness.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 519/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
519
c. Basedontheplot,explainwhethertherelationshipbetween x andy appearstobelinearornot
linear.
8. Adatasetconsistsoften( x, y)pairsofnumbers:
(3,20)(5,13)(6,9)(8,4)(11,0)(12,0)(14,1)(17,6)(18,9)(20,16)
a. Plotthedatainascatterdiagram.
b. Basedontheplot,explainwhethertherelationshipbetween x andy appearstobedeterministic
ortoinvolverandomness.
c. Basedontheplot,explainwhethertherelationshipbetween x andy appearstobelinearornot
linear.
9. Adatasetconsistsofnine( x, y)pairsofnumbers:
(8,16)(9,9)(10,4)(11,1)(12,0)(13,1)(14,4)(15,9)(16,16)
a. Plotthedatainascatterdiagram.
b. Basedontheplot,explainwhethertherelationshipbetween x andy appearstobedeterministic
ortoinvolverandomness.
c. Basedontheplot,explainwhethertherelationshipbetween x andy appearstobelinearornot
linear.
10. Adatasetconsistsoffive( x, y)pairsofnumbers:
(0,1) (2,5) (3,7) (5,11) (8,17)
a. Plotthedatainascatterdiagram.
b. Basedontheplot,explainwhethertherelationshipbetween x andy appearstobedeterministic
ortoinvolverandomness.
c. Basedontheplot,explainwhethertherelationshipbetween x andy appearstobelinearornot
linear.
A P P L I C A T I O N S
11. At60°Faparticularblendofautomotivegasolineweights6.17lb/gal.Theweighty ofgasolineonatanktruck
thatisloadedwith x gallonsofgasolineisgivenbythelinearequation
y=6.17 x
a. Explainwhethertherelationshipbetweentheweighty andtheamount x ofgasolineis
deterministicorcontainsanelementofrandomness.
b. Predicttheweightofgasolineonatanktruckthathasjustbeenloadedwith6,750gallonsof
gasoline.
12. Therateforrentingamotorscooterforonedayatabeachresortareais$25plus30centsforeachmilethe
scooterisdriven.Thetotalcosty indollarsforrentingascooteranddrivingit x milesis
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 520/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
520
y=0.30 x+25
a. Explainwhethertherelationshipbetweenthecosty ofrentingthescooterforadayandthe
distance x thatthescooterisdriventhatdayisdeterministicorcontainsanelementof
randomness.
b. Apersonintendstorentascooteronedayforatriptoanattraction17milesaway.Assuming
thatthetotaldistancethescooterisdrivenis34miles,predictthecostoftherental.
13. Thepricingscheduleforlaboronaservicecallbyanelevatorrepaircompanyis$150plus$50perhouron
site.
a. Writedownthelinearequationthatrelatesthelaborcosty tothenumberofhours x thatthe
repairmanisonsite.
b. Calculatethelaborcostforaservicecallthatlasts2.5hours.
14. Thecostofatelephonecallmadethroughaleasedlineserviceis2.5centsperminute.
a. Writedownthelinearequationthatrelatesthecosty (incents)ofacalltoitslength x .
b. Calculatethecostofacallthatlasts23minutes.
L A R G E D A T A S E T E X E R C I S E S
15. LargeDataSet1liststheSATscoresandGPAsof1,000students.PlotthescatterdiagramwithSATscoreas
theindependentvariable( x )andGPAasthedependentvariable(y ).Commentontheappearanceand
strengthofanylineartrend.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
16. LargeDataSet12liststhegolfscoresononeroundofgolffor75golfersfirstusingtheirownoriginalclubs,
thenusingclubsofanew,experimentaldesign(aftertwomonthsoffamiliarizationwiththenewclubs).Plot
thescatterdiagramwithgolfscoreusingtheoriginalclubsastheindependentvariable( x )andgolfscore
usingthenewclubsasthedependentvariable(y ).Commentontheappearanceandstrengthofanylinear
trend.
http://www.flatworldknowledge.com/sites/all/files/data12.xls
17. LargeDataSet13recordsthenumberofbiddersandsalespriceofaparticulartypeofantiquegrandfather
clockat60auctions.Plotthescatterdiagramwiththenumberofbiddersattheauctionastheindependent
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 521/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
521
variable( x )andthesalespriceasthedependentvariable(y ).Commentontheappearanceandstrengthof
anylineartrend.
http://www.flatworldknowledge.com/sites/all/files/data13.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 522/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
522
10.2TheLinearCorrelationCoefficient
L E A R N I N G O B J E C T I V E
1. Tolearnwhatthelinearcorrelationcoefficientis,howtocomputeit,andwhatittellsusaboutthe
relationshipbetweentwovariables x andy .
Figure 10.3 "Linear Relationships of Varying Strengths" illustrates linear relationships between two
variables x and y of varying strengths. It is visually apparent that in the situation in panel (a), x could
serve as a useful predictor of y, it would be less useful in the situation illustrated in panel (b), and in
the situation of panel (c) the linear relationship is so weak as to be practically nonexistent. The linear
correlation coefficient is a number computed directly from the data that measures the strength of the
linear relationship between the two variables x
andy
.
Figure 10.3 Linear Relationships of Varying Strengths
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 523/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
523
2. If |r| is near 0 (that is, if r is near 0 and of either sign) then the linear relationship
between x and y is weak.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 524/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
524
Figure 10.4 Linear Correlation Coefficient R
Pay particular attention to panel (f) in Figure 10.4 "Linear Correlation Coefficient ". It shows a
perfectly deterministic relationship between x and y, but r =0 because the relationship is not linear.
(In this particular case the points lie on the top half of a circle.)
E X A M P L E 1
Computethelinearcorrelationcoefficientfortheheightandweightpairsplottedin Figure10.2"Plot
ofHeightandWeightPairs".
Solution:
Evenforsmalldatasetslikethisonecomputationsaretoolongtodocompletelybyhand.Inactual
practicethedataareenteredintoacalculatororcomputerandastatisticsprogramisused.Inorder
toclarifythemeaningoftheformulaswewilldisplaythedataandrelatedquantitiesintabularform.
Foreach( x, y)pairwecomputethreenumbers: x 2, xy,andy 2,asshowninthetableprovided.Inthelast
lineofthetablewehavethesumofthenumbersineachcolumn.Usingthemwecompute:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 525/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
525
x y x 2 xy y2
68 151 4624 10268 22801
69 146 4761 10074 21316
70 157 4900 10990 24649
70 164 4900 11480 26896
71 171 5041 12141 29241
72 160 5184 11520 25600
72 163 5184 11736 26569
72 180 5184 12960 32400
73 170 5329 12410 28900
73 175 5329 12775 30625
74 178 5476 13172 31684
75 188 5625 14100 35344
Σ 859 2003 61537 143626 336025
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 526/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
526
K E Y T A K E A W A Y S
• Thelinearcorrelationcoefficientmeasuresthestrengthanddirectionofthelinearrelationshipbetween
twovariables x andy .
• Thesignofthelinearcorrelationcoefficientindicatesthedirectionofthelinearrelationship
between x andy .
• Whenr isnear1or−1thelinearrelationshipisstrong;whenitisnear0thelinearrelationshipisweak.
E X E R C I S E S
B A S I C
WiththeexceptionoftheexercisesattheendofSection10.3"ModellingLinearRelationshipswith
RandomnessPresent",thefirstBasicexerciseineachofthefollowingsectionsthroughSection10.7
"EstimationandPrediction"usesthedatafromthefirstexercisehere,thesecondBasicexerciseusesthe
datafromthesecondexercisehere,andsoon,andsimilarlyfortheApplicationexercises.Saveyour
computationsdoneontheseexercisessothatyoudonotneedtorepeatthemlater.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 527/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
527
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 528/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
528
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 529/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
529
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 530/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
530
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 531/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
531
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 532/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
532
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 533/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
533
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 534/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
534
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 535/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
535
http://www.flatworldknowledge.com/sites/all/files/data1.xls
30. LargeDataSet12liststhegolfscoresononeroundofgolffor75golfersfirstusingtheirownoriginal
clubs,thenusingclubsofanew,experimentaldesign(aftertwomonthsoffamiliarizationwiththenew
clubs).Computethelinearcorrelationcoefficientr .Compareitsvaluetoyourcommentsonthe
appearanceandstrengthofanylineartrendinthescatterdiagramthatyouconstructedinthesecond
largedatasetproblemforSection10.1"LinearRelationshipsBetweenVariables".
http://www.flatworldknowledge.com/sites/all/files/data12.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 536/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
536
31. LargeDataSet13recordsthenumberofbiddersandsalespriceofaparticulartypeofantique
grandfatherclockat60auctions.Computethelinearcorrelationcoefficientr .Compareitsvaluetoyour
commentsontheappearanceandstrengthofanylineartrendinthescatterdiagramthatyouconstructed
inthethirdlargedatasetproblemforSection10.1"LinearRelationshipsBetweenVariables".
http://www.flatworldknowledge.com/sites/all/files/data13.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 537/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
537
10.3ModellingLinearRelationshipswithRandomnessPresent
L E A R N I N G O B J E C T I V E
1.
Tolearntheframeworkinwhichthestatisticalanalysisofthelinearrelationshipbetweentwovariables x andy willbedone.
In this chapter we are dealing with a population for which we can associate to each element two
measurements, x and y. We are interested in situations in which the value of x can be used to draw
conclusions about the value of y, such as predicting the resale value y of a residential house based on
its size x . Since the relationship between x and y is not deterministic, statistical procedures must be
applied. For any statistical procedures, given in this book or elsewhere, the associated formulas are
valid only under specific assumptions. The set of assumptions in simple linear regression are a
mathematical description of the relationship between x and y. Such a set of assumptions is known as
a model.
For each fixed value of x a sub-population of the full population is determined, such as the collection
of all houses with 2,100 square feet of living space. For each element of that sub-population there is a
measurement y, such as the value of any 2,100-square-foot house. Let E ( y) denote the mean of all
the y-values for each particular value of x . E ( y) can change from x -value to x -value, such as the mean
value of all 2,100-square-foot houses, the (different) mean value for all 2,500-square foot-houses,
and so on.Our first assumption is that the relationship between x and the mean of they-values in the sub-
population determined by x is linear. This means that there exist numbers β 1 and β 0 such that
E ( y)= β 1 x + β 0
This linear relationship is the reason for the word “linear” in “simple linear regression” below. (The
word “simple” means that y depends on only one other variable and not two or more.)
Our next assumption is that for each value of x the y-values scatter about the mean E ( y) according to
a normal distribution centered at E ( y) and with a standard deviation that is the same for every value of x . This is the same as saying that there exists a normally distributed random variable with
mean 0 and standard deviation so that the relationship between x and y in the whole population is
y= β 1 x + β 0+ε
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 538/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
538
Our last assumption is that the random deviations associated with different observations are
independent.
In summary, the model is:
SimpleLinearRegressionModel
For each point ( x, y) in data set the y-value is an independent observation of
y= β 1 x + β 0+ε
where β 1 and β 0 are fixed parameters and is a normally distributed random variable with mean 0 and
an unknown standard deviation .
The line with equation y= β 1 x+ β 0 is called the population regression line.
Figure 10.5 "The Simple Linear Model Concept" illustrates the model. The symbols N ( µ,σ 2) denote a
normal distribution with mean and variance σ 2, hence standard deviation .
Figure 10.5 The Simple Linear Model Concept
It is conceptually important to view the model as a sum of two parts:
y= β 1 x + β 0+ε
1. Deterministic Part. The first part β 1 x+ β 0 is the equation that describes the trend in y as x increases. The
line that we seem to see when we look at the scatter diagram is an approximation of the
line y= β 1 x+ β 0. There is nothing random in this part, and therefore it is called the deterministic part of the
model.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 539/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
539
2. Random Part. The second part is a random variable, often called the error term or the noise. This
part explains why the actual observed values of y are not exactly on but fluctuate near a line. Information
about this term is important since only when one knows how much noise there is in the data can one
know how trustworthy the detected trend is.
There are three parameters in this model: β 0, β 1, and . Each has an important interpretation,
particularly β 1 and . The slope parameter β 1represents the expected change in y brought about by a
unit increase in x . The standard deviation represents the magnitude of the noise in the data.
There are procedures for checking the validity of the three assumptions, but for us it will be sufficient
to visually verify the linear trend in the data. If the data set is large then the points in the scatter
diagram will form a band about an apparent straight line. The normality of with a constant
standard deviation corresponds graphically to the band being of roughly constant width, and with
most points concentrated near the middle of the band.
Fortunately, the three assumptions do not need to hold exactly in order for the procedures and
analysis developed in this chapter to be useful.
K E Y T A K E A W A Y
• Statisticalproceduresarevalidonlywhencertainassumptionsarevalid.Theassumptionsunderlyingthe
analysesdoneinthischapteraregraphicallysummarizedinFigure10.5"TheSimpleLinearModel
Concept".
E X E R C I S E S
1. StatethethreeassumptionsthatarethebasisfortheSimpleLinearRegressionModel.
2. TheSimpleLinearRegressionModelissummarizedbytheequation
y= β 1 x+ β 0+ε
Identifythedeterministicpartandtherandompart.
3. Isthenumber β 1intheequation y= β 1 x+ β 0astatisticorapopulationparameter?Explain.
4. Isthenumberσ intheSimpleLinearRegressionModelastatisticorapopulationparameter?Explain.
5. DescribewhattolookforinascatterdiagraminordertocheckthattheassumptionsoftheSimpleLinear
RegressionModelaretrue.
6. Trueorfalse:theassumptionsoftheSimpleLinearRegressionModelmustholdexactlyinorderforthe
proceduresandanalysisdevelopedinthischaptertobeuseful.
A N S W E R S
1.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 540/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
540
a. Themeanofy islinearlyrelatedto x .
b. Foreachgiven x ,y isanormalrandomvariablewithmean β 1 x+ β 0andstandarddeviationσ .
c. Alltheobservationsofy inthesampleareindependent.
3. β 1isapopulationparameter.
5. Alineartrend.
10.4TheLeastSquaresRegressionLine
L E A R N I N G O B J E C T I V E S
1. Tolearnhowtomeasurehowwellastraightlinefitsacollectionofdata.
2. Tolearnhowtoconstructtheleastsquaresregressionline,thestraightlinethatbestfitsacollectionof
data.
3. Tolearnthemeaningoftheslopeoftheleastsquaresregressionline.
4. Tolearnhowtousetheleastsquaresregressionlinetoestimatetheresponsevariabley intermsofthe
predictorvariable x .
GoodnessofFitofaStraightLinetoData
Once the scatter diagram of the data has been drawn and the model assumptions described in the
previous sections at least visually verified (and perhaps the correlation coefficient r computed to
quantitatively verify the linear trend), the next step in the analysis is to find the straight line that best fits
the data. We will explain how to measure how well a straight line fits a collection of points by examining
how well the line y=12 x−1 fits the data set
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 541/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
541
To each point in the data set there is associated an “error,” the positive or negative vertical distance
from the point to the line: positive if the point is above the line and negative if it is below the line.
The error can be computed as the actual y-value of the point minus the y-value yˆ that is “predicted”
by inserting the x -value of the data point into the formula for the line:
error at data point ( x, y)=(true y)−(predicted y)= y− yˆ
The computation of the error for each of the five points in the data set is shown in Table 10.1 "The
Errors in Fitting Data with a Straight Line".
Table 10.1 The Errors in Fitting Data with a Straight Line
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 542/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
542
x y yˆ=12 x−1 y− yˆ ( y− yˆ)2
2 0 0 0 0
2 1 0 1 1
6 2 2 0 0
8 3 3 0 0
10 3 4 −1 1
Σ - - - 0 2
A first thought for a measure of the goodness of fit of the line to the data would be simply to add the
errors at every point, but the example shows that this cannot work well in general. The line does not
fit the data perfectly (no line can), yet because of cancellation of positive and negative errors the sum
of the errors (the fourth column of numbers) is zero. Instead goodness of fit is measured by the sum
of the squares of the errors. Squaring eliminates the minus signs, so no cancellation can occur. For
the data and line in Figure 10.6 "Plot of the Five-Point Data and the Line " the sum of the squared
errors (the last column of numbers) is 2. This number measures the goodness of fit of the line to the
data.
Definition
The goodness of fit of a line yˆ=mx+b to a set of n pairs ( x, y) of numbers in a sample is the sum of the
squared errors
Σ( y− yˆ)2
(n terms in the sum, one for each data pair).
TheLeastSquaresRegressionLine
Given any collection of pairs of numbers (except when all the x -values are the same) and the
corresponding scatter diagram, there always exists exactly one straight line that fits the data better
than any other, in the sense of minimizing the sum of the squared errors. It is called the least squares
regression line. Moreover there are formulas for its slope and y-intercept.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 543/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
543
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 544/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
544
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 545/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
545
T A B L E 1 0 . 2 T H E E R R O R S I N F I T T I N G D A T A W I T H T H E L E A S T
S Q U A R E S R E G R E S S I O N L I N E
x y yˆ=0.34375 x−0.125 y− yˆ ( y− yˆ)2
2 0 0.5625 −0.5625 0.31640625
2 1 0.5625 0.4375 0.19140625
6 2 1.9375 0.0625 0.00390625
8 3 2.6250 0.3750 0.14062500
10 3 3.3125 −0.3125 0.09765625
E X A M P L E 3
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 546/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
546
Table10.3"DataonAgeandValueofUsedAutomobilesofaSpecificMakeandModel" showsthe
ageinyearsandtheretailvalueinthousandsofdollarsofarandomsampleoftenautomobilesofthe
samemakeandmodel.
a. Constructthescatterdiagram.
b. Computethelinearcorrelationcoefficientr .Interpretitsvalueinthecontextoftheproblem.
c. Computetheleastsquaresregressionline.Plotitonthescatterdiagram.
d. Interpretthemeaningoftheslopeoftheleastsquaresregressionlineinthecontextoftheproblem.
e. Supposeafour-year-oldautomobileofthismakeandmodelisselectedatrandom.Usetheregression
equationtopredictitsretailvalue.
f. Supposea20-year-oldautomobileofthismakeandmodelisselectedatrandom.Usetheregression
equationtopredictitsretailvalue.Interprettheresult.
g. Commentonthevalidityofusingtheregressionequationtopredictthepriceofabrandnew
automobileofthismakeandmodel.
T A B L E 1 0 . 3 D A T A O N A G E A N D V A L U E O F U S E D A U T O M O B I L E S O F A
S P E C I F I C M A K E A N D M O D E L
x 2 3 3 3 4 4 5 5 5 6
y 28.7 24.8 26.0 30.5 23.8 24.6 23.8 20.4 21.6 22.1
Solution:
a. ThescatterdiagramisshowninFigure10.7"ScatterDiagramforAgeandValueofUsed
Automobiles".
Figure10.7 ScatterDiagramforAgeandValueofUsedAutomobiles
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 547/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
547
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 548/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
548
d. Sinceweknownothingabouttheautomobileotherthanitsage,weassumethatitisofabout
averagevalueandusetheaveragevalueofallfour-year-oldvehiclesofthismakeandmodelas
ourestimate.Theaveragevalueissimplythevalueof yˆobtainedwhenthenumber4isinserted
for x intheleastsquaresregressionequation:
e.
yˆ=−2.05(4)+32.83=24.63
whichcorrespondsto$24,630.
f. Nowweinsert x=20intotheleastsquaresregressionequation,toobtain
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 549/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
549
yˆ=−2.05(20)+32.83=−8.17
whichcorrespondsto−$8,170.Somethingiswronghere,sinceanegativemakesnosense.The
errorarosefromapplyingtheregressionequationtoavalueof x notintherangeof x -valuesin
theoriginaldata,fromtwotosixyears.
Applyingtheregressionequation yˆ= β ̂ 1 x+ β ̂ 0toavalueof xoutsidetherangeof x -valuesinthedata
setiscalledextrapolation.Itisaninvaliduseoftheregressionequationandshouldbeavoided.
g. Thepriceofabrandnewvehicleofthismakeandmodelisthevalueoftheautomobileatage0.If
thevalue x=0isinsertedintotheregressionequationtheresultisalways β ̂ 0,they -intercept,inthis
case32.83,whichcorrespondsto$32,830.Butthisisacaseofextrapolation,justaspart(f)was,
hencethisresultisinvalid,althoughnotobviouslyso.Inthecontextoftheproblem,since
automobilestendtolosevaluemuchmorequicklyimmediatelyaftertheyarepurchasedthanthey
doaftertheyareseveralyearsold,thenumber$32,830isprobablyanunderestimateofthepriceof
anewautomobileofthismakeandmodel.
For emphasis we highlight the points raised by parts (f) and (g) of the example.
DefinitionThe process of using the least squares regression equation to estimate the value of y at a value of x that
does not lie in the range of the x-values in the data set that was used to form the regression line is
called extrapolation. It is an invalid use of the regression equation that can lead to errors, hence should
be avoided.
TheSumoftheSquaredErrorsSSE
In general, in order to measure the goodness of fit of a line to a set of data, we must compute the
predicted y-value yˆ at every point in the data set, compute each error, square it, and then add up all the
squares. In the case of the least squares regression line, however, the line that best fits the data, the sum
of the squared errors can be computed directly from the data using the following formula.
The sum of the squared errors for the least squares regression line is denoted by SSE . It can be computed
using the formulaSSE =SS yy− β ̂ 1Ss xy
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 550/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
550
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 551/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
551
SSE = SS yy− β ̂ 1 SS xy=87.781−(−2.05)(−28.7)=28.946
K E Y T A K E A W A Y S
• Howwellastraightlinefitsadatasetismeasuredbythesumofthesquarederrors.
• Theleastsquaresregressionlineisthelinethatbestfitsthedata.Itsslopeandy -interceptarecomputed
fromthedatausingformulas.
• Theslope β ̂ 1oftheleastsquaresregressionlineestimatesthesizeanddirectionofthemeanchangeinthe
dependentvariabley whentheindependentvariable x isincreasedbyoneunit.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 552/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
552
• ThesumofthesquarederrorsSSE oftheleastsquaresregressionlinecanbecomputedusingaformula,
withouthavingtocomputealltheindividualerrors.
E X E R C I S E S
B A S I C
FortheBasicandApplicationexercisesinthissectionusethecomputationsthatweredoneforthe
exerciseswiththesamenumberin Section10.2"TheLinearCorrelationCoefficient" .
1. ComputetheleastsquaresregressionlineforthedatainExercise1of Section10.2"TheLinearCorrelation
Coefficient".
2. ComputetheleastsquaresregressionlineforthedatainExercise2of Section10.2"TheLinearCorrelation
Coefficient".
3. ComputetheleastsquaresregressionlineforthedatainExercise3of Section10.2"TheLinearCorrelation
Coefficient".
4. ComputetheleastsquaresregressionlineforthedatainExercise4of Section10.2"TheLinearCorrelation
Coefficient".
5. ForthedatainExercise5ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. ComputethesumofthesquarederrorsSSE usingthedefinitionΣ( y− yˆ)2.
c. ComputethesumofthesquarederrorsSSE usingtheformulaSSE =SS yy− β ̂ 1SS xy.
6. ForthedatainExercise6ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. ComputethesumofthesquarederrorsSSE usingthedefinitionΣ( y− yˆ)2.
c. ComputethesumofthesquarederrorsSSE usingtheformulaSSE =SS yy− β ̂ 1SS xy.
7. ComputetheleastsquaresregressionlineforthedatainExercise7of Section10.2"TheLinearCorrelation
Coefficient".
8. ComputetheleastsquaresregressionlineforthedatainExercise8of Section10.2"TheLinearCorrelation
Coefficient".
9. ForthedatainExercise9ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. CanyoucomputethesumofthesquarederrorsSSE usingthedefinitionΣ( y− yˆ)2?Explain.
c. ComputethesumofthesquarederrorsSSE usingtheformulaSSE =SS yy− β ̂ 1SS xy.
10. ForthedatainExercise10ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. CanyoucomputethesumofthesquarederrorsSSE usingthedefinitionΣ( y− yˆ)2?Explain.
c. ComputethesumofthesquarederrorsSSE usingtheformulaSSE =SS yy− β ̂ 1SS xy.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 553/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
553
A P P L I C A T I O N S
11. ForthedatainExercise11ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. Onaverage,howmanynewwordsdoesachildfrom13to18monthsoldlearneachmonth?
Explain.
c. Estimatetheaveragevocabularyofall16-month-oldchildren.
12. ForthedatainExercise12ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. Onaverage,howmanyadditionalfeetareaddedtothebrakingdistanceforeachadditional100
poundsofweight?Explain.
c. Estimatetheaveragebrakingdistanceofallcarsweighing3,000pounds.
13.
ForthedatainExercise13ofSection10.2"TheLinearCorrelationCoefficient"a. Computetheleastsquaresregressionline.
b. Estimatetheaveragerestingheartrateofall40-year-oldmen.
c. Estimatetheaveragerestingheartrateofallnewbornbabyboys.Commentonthevalidityofthe
estimate.
14. ForthedatainExercise14ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. Estimatetheaveragewaveheightwhenthewindisblowingat10milesperhour.
c. Estimatetheaveragewaveheightwhenthereisnowindblowing.Commentonthevalidityof
theestimate.
15. ForthedatainExercise15ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. Onaverage,foreachadditionalthousanddollarsspentonadvertising,howdoesrevenue
change?Explain.
c. Estimatetherevenueif$2,500isspentonadvertisingnextyear.
16. ForthedatainExercise16ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. Onaverage,foreachadditionalinchofheightoftwo-year-oldgirl,whatisthechangeintheadult
height?Explain.
c. Predicttheadultheightofatwo-year-oldgirlwhois33inchestall.
17. ForthedatainExercise17ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 554/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
554
b. ComputeSSE usingtheformulaSSE =SS yy− β ̂ 1SS xy.
c. Estimatetheaveragefinalexamscoreofallstudentswhosecourseaveragejustbeforetheexam
is85.
18. ForthedatainExercise18ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. ComputeSSE usingtheformulaSSE =SS yy− β ̂ 1SS xy.
c. Estimatethenumberofacresthatwouldbeharvestedif90millionacresofcornwereplanted.
19. ForthedatainExercise19ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. Interpretthevalueoftheslopeoftheleastsquaresregressionlineinthecontextoftheproblem.
c. Estimatetheaverageconcentrationoftheactiveingredientinthebloodinmenafterconsuming
1ounceofthemedication.
20. ForthedatainExercise20ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. Interpretthevalueoftheslopeoftheleastsquaresregressionlineinthecontextoftheproblem.
c. Estimatetheageofanoaktreewhosegirthfivefeetoffthegroundis92inches.
21. ForthedatainExercise21ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. The28-daystrengthofconcreteusedonacertainjobmustbeatleast3,200psi.Ifthe3-day
strengthis1,300psi,wouldweanticipatethattheconcretewillbesufficientlystrongonthe28th
day?Explainfully.
22. ForthedatainExercise22ofSection10.2"TheLinearCorrelationCoefficient"
a. Computetheleastsquaresregressionline.
b. Ifthepowerfacilityiscalledupontoprovidemorethan95millionwatt-hourstomorrowthen
energywillhavetobepurchasedfromelsewhereatapremium.Theforecastisforanaverage
temperatureof42degrees.Shouldthecompanyplanonpurchasingpoweratapremium?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 555/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
555
L A R G E D A T A S E T E X E R C I S E S
25. LargeDataSet1liststheSATscoresandGPAsof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. ComputetheleastsquaresregressionlinewithSATscoreastheindependentvariable( x )and
GPAasthedependentvariable(y ).
b. Interpretthemeaningoftheslope β ̂ 1ofregressionlineinthecontextofproblem.
c. ComputeSSE ,themeasureofthegoodnessoffitoftheregressionlinetothesampledata.
d. EstimatetheGPAofastudentwhoseSATscoreis1350.
26. LargeDataSet12liststhegolfscoresononeroundofgolffor75golfersfirstusingtheirownoriginalclubs,
thenusingclubsofanew,experimentaldesign(aftertwomonthsoffamiliarizationwiththenewclubs).
http://www.flatworldknowledge.com/sites/all/files/data12.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 556/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
556
a. Computetheleastsquaresregressionlinewithscoresusingtheoriginalclubsastheindependent
variable( x )andscoresusingthenewclubsasthedependentvariable(y ).
b. Interpretthemeaningoftheslope β ̂ 1ofregressionlineinthecontextofproblem.
c. ComputeSSE ,themeasureofthegoodnessoffitoftheregressionlinetothesampledata.
d. Estimatethescorewiththenewclubsofagolferwhosescorewiththeoldclubsis73.
27. LargeDataSet13recordsthenumberofbiddersandsalespriceofaparticulartypeofantiquegrandfather
clockat60auctions.
http://www.flatworldknowledge.com/sites/all/files/data13.xls
a. Computetheleastsquaresregressionlinewiththenumberofbidderspresentattheauctionas
theindependentvariable( x )andsalespriceasthedependentvariable(y ).
b. Interpretthemeaningoftheslope β ̂ 1ofregressionlineinthecontextofproblem.
c. ComputeSSE ,themeasureofthegoodnessoffitoftheregressionlinetothesampledata.
d. Estimatethesalespriceofaclockatanauctionatwhichthenumberofbiddersisseven.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 557/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
557
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 558/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
558
10.5StatisticalInferencesAboutβ1L E A R N I N G O B J E C T I V E S
1. Tolearnhowtoconstructaconfidenceintervalfor β 1,theslopeofthepopulationregressionline.
2. Tolearnhowtotesthypothesesregarding β 1.
The parameter β 1, the slope of the population regression line, is of primary importance in regression
analysis because it gives the true rate of change in the mean E ( y) in response to a unit increase in the
predictor variable x . For every unit increase in x the mean of the response variable y changes
by β 1 units, increasing if β 1>0 and decreasing if β 1<0. We wish to construct confidence intervals
for β 1 and test hypotheses about it.
ConfidenceIntervalsforβ1
The slope β ̂ 1 of the least squares regression line is a point estimate of β 1. A confidence interval for β 1 is
given by the following formula.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 559/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
559
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 560/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
560
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 561/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
561
yearsoldweare90%confidentthatforeachadditionalyearofagetheaveragevalueofsucha
vehicledecreasesbybetween$1,100and$3,000.
TestingHypothesesAboutβ1
Hypotheses regarding β 1 can be tested using the same five-step procedures, either the critical value
approach or the p-value approach, that were introduced in Section 8.1 "The Elements of Hypothesis
Testing" and Section 8.3 "The Observed Significance of a Test" of Chapter 8 "Testing Hypotheses". The
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 562/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
562
null hypothesis always has the form H 0: β 1= B0 where B0 is a number determined from the statement of the
problem. The three forms of the alternative hypothesis, with the terminology for each case, are:
FormofH a Terminology
H a: β 1< B0 Left-tailed
H a: β 1> B0 Right-tailed
H a: β 1≠ B0 Two-tailed
The value zero for B0 is of particular importance since in that case the null hypothesis is H 0: β 1=0, which
corresponds to the situation in which x is not useful for predicting y. For if β 1=0 then the population
regression line is horizontal, so the mean E ( y) is the same for every value of x and we are just as well off in
ignoring x completely and approximating y by its average value. Given two variables x and y, the burden
of proof is that x is useful for predicting y, not that it is not. Thus the phrase “test whether x is useful for
prediction of y,” or words to that effect, means to perform the test
H 0: β 1=0 vs. H a: β 1≠0
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 563/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
563
• Step5.AsshowninFigure10.9"RejectionRegionandTestStatisticfor"theteststatisticfalls
intherejectionregion.ThedecisionistorejectH0.Inthecontextoftheproblemour
conclusionis:
Thedataprovidesufficientevidence,atthe2%levelofsignificance,toconcludethatthe
slopeofthepopulationregressionlineisnonzero,sothat x isusefulasapredictorofy .
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 564/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
564
Figure10.9RejectionRegionandTestStatisticforNote10.33"Example8"
E X A M P L E 9
Acarsalesmanclaimsthatautomobilesbetweentwoandsixyearsoldofthemakeandmodel
discussedinNote10.19"Example3"inSection10.4"TheLeastSquaresRegressionLine" losemore
than$1,100invalueeachyear.Testthisclaimatthe5%levelofsignificance.
Solution:
Wewillperformthetestusingthecriticalvalueapproach.
• Step1.Intermsofthevariables x andy ,thesalesman’sclaimisthatif x isincreasedby1unit(one
additionalyearinage),thenydecreasesbymorethan1.1units(morethan$1,100).Thushis
assertionisthattheslopeofthepopulationregressionlineisnegative,andthatitismore
negativethan−1.1.Insymbols, β 1<−1.1.Sinceitcontainsaninequality,thishastobethealternative
hypotheses.Thenullhypothesishastobeanequalityandhavethesamenumberontheright
handside,sotherelevanttestis
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 565/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
565
Thedataprovidesufficientevidence,atthe5%levelofsignificance,toconcludethatvehiclesof
thismakeandmodelandinthisagerangelosemorethan$1,100peryearinvalue,onaverage.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 566/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
566
Figure10.10RejectionRegionandTestStatisticfor Note10.34"Example9"
K E Y T A K E A W A Y S
• Theparameter β 1,theslopeofthepopulationregressionline,isofprimaryinterestbecauseitdescribes
theaveragechangeiny withrespecttounitincreasein x .
• Thestatistic β ̂ 1,theslopeoftheleastsquaresregressionline,isapointestimateof β 1.Confidenceintervals
for β 1canbecomputedusingaformula.
• Hypothesesregarding β 1aretestedusingthesamefive-stepproceduresintroducedinChapter8"Testing
Hypotheses".
E X E R C I S E S
B A S I C
FortheBasicandApplicationexercisesinthissectionusethecomputationsthatweredoneforthe
exerciseswiththesamenumberinSection10.2"TheLinearCorrelationCoefficient"andSection10.4"The
LeastSquaresRegressionLine".
1. Constructthe95%confidenceintervalfortheslope β 1ofthepopulationregressionlinebasedonthesample
datasetofExercise1ofSection10.2"TheLinearCorrelationCoefficient".
2. Constructthe90%confidenceintervalfortheslope β 1ofthepopulationregressionlinebasedonthesample
datasetofExercise2ofSection10.2"TheLinearCorrelationCoefficient".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 567/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
567
3. Constructthe90%confidenceintervalfortheslope β 1ofthepopulationregressionlinebasedonthesample
datasetofExercise3ofSection10.2"TheLinearCorrelationCoefficient".
4. Constructthe99%confidenceintervalfortheslope β 1ofthepopulationregressionExercise4ofSection10.2
"TheLinearCorrelationCoefficient".
5. ForthedatainExercise5ofSection10.2"TheLinearCorrelationCoefficient"test,atthe10%levelof
significance,whether x isusefulforpredictingy (thatis,whether β 1≠0).
6. ForthedatainExercise6ofSection10.2"TheLinearCorrelationCoefficient"test,atthe5%levelof
significance,whether x isusefulforpredictingy (thatis,whether β 1≠0).
7. Constructthe90%confidenceintervalfortheslope β 1ofthepopulationregressionlinebasedonthesample
datasetofExercise7ofSection10.2"TheLinearCorrelationCoefficient".
8. Constructthe95%confidenceintervalfortheslope β 1ofthepopulationregressionlinebasedonthesample
datasetofExercise8ofSection10.2"TheLinearCorrelationCoefficient".
9. ForthedatainExercise9ofSection10.2"TheLinearCorrelationCoefficient"test,atthe1%levelof
significance,whether x isusefulforpredictingy (thatis,whether β 1≠0).
10. ForthedatainExercise10ofSection10.2"TheLinearCorrelationCoefficient"test,atthe1%levelof
significance,whether x isusefulforpredictingy (thatis,whether β 1≠0).
A P P L I C A T I O N S
11. ForthedatainExercise11ofSection10.2"TheLinearCorrelationCoefficient"constructa90%confidence
intervalforthemeannumberofnewwordsacquiredpermonthbychildrenbetween13and18monthsof
age.
12. ForthedatainExercise12ofSection10.2"TheLinearCorrelationCoefficient"constructa90%confidence
intervalforthemeanincreasedbrakingdistanceforeachadditional100poundsofvehicleweight.
13. ForthedatainExercise13ofSection10.2"TheLinearCorrelationCoefficient"test,atthe10%levelof
significance,whetherageisusefulforpredictingrestingheartrate.
14. ForthedatainExercise14ofSection10.2"TheLinearCorrelationCoefficient"test,atthe10%levelof
significance,whetherwindspeedisusefulforpredictingwaveheight.
15. ForthesituationdescribedinExercise15ofSection10.2"TheLinearCorrelationCoefficient"
a.
Constructthe95%confidenceintervalforthemeanincreaseinrevenueperadditionalthousanddollarsspentonadvertising.
b. Anadvertisingagencytellsthebusinessownerthatforeveryadditionalthousanddollarsspent
onadvertising,revenuewillincreasebyover$25,000.Testthisclaim(whichisthealternative
hypothesis)atthe5%levelofsignificance.
c. Performthetestofpart(b)atthe10%levelofsignificance.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 568/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
568
d. Basedontheresultsin(b)and(c),howbelievableistheadagency’sclaim?(Thisisasubjective
judgement.)
16. ForthesituationdescribedinExercise16ofSection10.2"TheLinearCorrelationCoefficient"
a. Constructthe90%confidenceintervalforthemeanincreaseinheightperadditionalinchof
lengthatagetwo.
b. Itisclaimedthatforgirlseachadditionalinchoflengthatagetwomeansmorethanan
additionalinchofheightatmaturity.Testthisclaim(whichisthealternativehypothesis)atthe
10%levelofsignificance.
17. ForthedatainExercise17ofSection10.2"TheLinearCorrelationCoefficient"test,atthe10%levelof
significance,whethercourseaveragebeforethefinalexamisusefulforpredictingthefinalexamgrade.
18. ForthesituationdescribedinExercise18ofSection10.2"TheLinearCorrelationCoefficient",anagronomist
claimsthateachadditionalmillionacresplantedresultsinmorethan750,000additionalacresharvested.
Testthisclaimatthe1%levelofsignificance.
19. ForthedatainExercise19ofSection10.2"TheLinearCorrelationCoefficient"test,atthe1/10thof1%level
ofsignificance,whether,ignoringallotherfactssuchasageandbodymass,theamountofthemedication
consumedisausefulpredictorofbloodconcentrationoftheactiveingredient.
20. ForthedatainExercise20ofSection10.2"TheLinearCorrelationCoefficient"test,atthe1%levelof
significance,whetherforeachadditionalinchofgirththeageofthetreeincreasesbyatleasttwoandone-
halfyears.
21. ForthedatainExercise21ofSection10.2"TheLinearCorrelationCoefficient"
a. Constructthe95%confidenceintervalforthemeanincreaseinstrengthat28daysforeach
additionalhundredpsiincreaseinstrengthat3days.
b. Test,atthe1/10thof1%levelofsignificance,whetherthe3-daystrengthisusefulforpredicting
28-daystrength.
22. ForthesituationdescribedinExercise22ofSection10.2"TheLinearCorrelationCoefficient"
a. Constructthe99%confidenceintervalforthemeandecreaseinenergydemandforeachone-
degreedropintemperature.
b. Anengineerwiththepowercompanybelievesthatforeachone-degreeincreaseintemperature,
dailyenergydemandwilldecreasebymorethan3.6millionwatt-hours.Testthisclaimatthe1%
levelofsignificance.
L A R G E D A T A S E T E X E R C I S E S
23. LargeDataSet1liststheSATscoresandGPAsof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 569/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
569
a. Computethe90%confidenceintervalfortheslope β 1ofthepopulationregressionlinewithSAT
scoreastheindependentvariable( x )andGPAasthedependentvariable(y ).
b. Test,atthe10%levelofsignificance,thehypothesisthattheslopeofthepopulationregression
lineisgreaterthan0.001,againstthenullhypothesisthatitisexactly0.001.
24. LargeDataSet12liststhegolfscoresononeroundofgolffor75golfersfirstusingtheirownoriginalclubs,
thenusingclubsofanew,experimentaldesign(aftertwomonthsoffamiliarizationwiththenewclubs).
http://www.flatworldknowledge.com/sites/all/files/data12.xls
a. Computethe95%confidenceintervalfortheslope β 1ofthepopulationregressionlinewith
scoresusingtheoriginalclubsastheindependentvariable( x )andscoresusingthenewclubsas
thedependentvariable(y ).
b. Test,atthe10%levelofsignificance,thehypothesisthattheslopeofthepopulationregression
lineisdifferentfrom1,againstthenullhypothesisthatitisexactly1.
25. LargeDataSet13recordsthenumberofbiddersandsalespriceofaparticulartypeofantiquegrandfather
clockat60auctions.
http://www.flatworldknowledge.com/sites/all/files/data13.xls
a. Computethe95%confidenceintervalfortheslope β 1ofthepopulationregressionlinewiththe
numberofbidderspresentattheauctionastheindependentvariable( x )andsalespriceasthe
dependentvariable(y ).
b. Test,atthe10%levelofsignificance,thehypothesisthattheaveragesalespriceincreasesby
morethan$90foreachadditionalbidderatanauction,againstthedefaultthatitincreasesby
exactly$90.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 570/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
570
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 571/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
571
0.6TheCoefficientofDetermination
L E A R N I N G O B J E C T I V E
1. Tolearnwhatthecoefficientofdeterminationis,howtocomputeit,andwhatittellsusaboutthe
relationshipbetweentwovariables x andy .
If the scatter diagram of a set of ( x, y) pairs shows neither an upward or downward trend, then the
horizontal line yˆ= y− fits it well, as illustrated in Figure 10.11. The lack of any upward or downward
trend means that when an element of the population is selected at random, knowing the value of the
measurement x for that element is not helpful in predicting the value of the measurement y.
Figure 10.11
yˆ= y−
If the scatter diagram shows a linear trend upward or downward then it is useful to compute the leastsquares regression line yˆ= β ̂ 1 x+ β ̂ 0 and use it in predicting y. Figure 10.12 "Same Scatter Diagram with
Two Approximating Lines" illustrates this. In each panel we have plotted the height and weight data
of Section 10.1 "Linear Relationships Between Variables". This is the same scatter plot as Figure 10.2
"Plot of Height and Weight Pairs", with the average value line yˆ= y− superimposed on it in the left
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 572/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
572
panel and the least squares regression line imposed on it in the right panel. The errors are indicated
graphically by the vertical line segments.
Figure 10.12 Same Scatter Diagram with Two Approximating Lines
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 573/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
573
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 574/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
574
E X A M P L E 1 0
ThevalueofusedvehiclesofthemakeandmodeldiscussedinNote10.19"Example3"inSection10.4
"TheLeastSquaresRegressionLine"varieswidely.ThemostexpensiveautomobileinthesampleinTable
10.3"DataonAgeandValueofUsedAutomobilesofaSpecificMakeandModel"hasvalue$30,500,
whichisnearlyhalfagainasmuchastheleastexpensiveone,whichisworth$20,400.Findtheproportion
ofthevariabilityinvaluethatisaccountedforbythelinearrelationshipbetweenageandvalue.
Solution:
Theproportionofthevariabilityinvaluey thatisaccountedforbythelinearrelationshipbetweenitand
age x isgivenbythecoefficientofdetermination,r 2.Sincethecorrelationcoefficientr wasalready
computedinNote10.19"Example3"asr =−0.819,r 2=(−0.819)2=0.671.About67%ofthevariabilityinthevalue
ofthisvehiclecanbeexplainedbyitsage.
E X A M P L E 1 1
Useeachofthethreeformulasforthecoefficientofdeterminationtocomputeitsvaluefortheexample
ofagesandvaluesofvehicles.
Solution:
InNote10.19"Example3"inSection10.4"TheLeastSquaresRegressionLine"wecomputedtheexact
values
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 575/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
575
The coefficient of determination r2 can always be computed by squaring the correlation
coefficient r if it is known. Any one of the defining formulas can also be used. Typically one would
make the choice based on which quantities have already been computed. What should be avoided is
trying to compute r by taking the square root of r2, if it is already known, since it is easy to make a
sign error this way. To see what can go wrong, suppose r 2=0.64. Taking the square root of a positive
number with any calculating device will always return a positive result. The square root of 0.64 is
0.8. However, the actual value of r
might be the negative number 0.8.
K E Y T A K E A W A Y S
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 576/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
576
• Thecoefficientofdeterminationr 2estimatestheproportionofthevariabilityinthevariabley thatis
explainedbythelinearrelationshipbetweeny andthevariable x .
• Thereareseveralformulasforcomputingr 2.Thechoiceofwhichonetousecanbebasedonwhich
quantitieshavealreadybeencomputedsofar.
E X E R C I S E S
B A S I C
FortheBasicandApplicationexercisesinthissectionusethecomputationsthatweredoneforthe
exerciseswiththesamenumberin Section10.2"TheLinearCorrelationCoefficient" ,Section10.4
"TheLeastSquaresRegressionLine" ,andSection10.5"StatisticalInferencesAbout" .
1. ForthesampledatasetofExercise1ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2= β ̂ 1SS xy/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.2. ForthesampledatasetofExercise2ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2= β ̂ 1SS xy/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.
3. ForthesampledatasetofExercise3ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2= β ̂ 1SS xy/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.
4. ForthesampledatasetofExercise4ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2= β ̂ 1SS xy/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.
5. ForthesampledatasetofExercise5ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2= β ̂ 1SS xy/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.
6. ForthesampledatasetofExercise6ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2= β ̂ 1SS xy/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.
7. ForthesampledatasetofExercise7ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2=(SS yy−SSE )/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.
8. ForthesampledatasetofExercise8ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2=(SS yy−SSE )/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 577/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
577
9. ForthesampledatasetofExercise9ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2=(SS yy−SSE )/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.
10. ForthesampledatasetofExercise9ofSection10.2"TheLinearCorrelationCoefficient"findthecoefficient
ofdeterminationusingtheformular 2=(SS yy−SSE )/SS yy.Confirmyouranswerbysquaringr ascomputedinthat
exercise.
A P P L I C A T I O N S
11. ForthedatainExercise11ofSection10.2"TheLinearCorrelationCoefficient"computethecoefficientof
determinationandinterpretitsvalueinthecontextofageandvocabulary.
12. ForthedatainExercise12ofSection10.2"TheLinearCorrelationCoefficient"computethecoefficientof
determinationandinterpretitsvalueinthecontextofvehicleweightandbrakingdistance.
13. ForthedatainExercise13ofSection10.2"TheLinearCorrelationCoefficient"computethecoefficientof
determinationandinterpretitsvalueinthecontextofageandrestingheartrate.Intheagerangeofthedata,doesageseemtobeaveryimportantfactorwithregardtoheartrate?
14. ForthedatainExercise14ofSection10.2"TheLinearCorrelationCoefficient"computethecoefficientof
determinationandinterpretitsvalueinthecontextofwindspeedandwaveheight.Doeswindspeedseem
tobeaveryimportantfactorwithregardtowaveheight?
15. ForthedatainExercise15ofSection10.2"TheLinearCorrelationCoefficient"findtheproportionofthe
variabilityinrevenuethatisexplainedbylevelofadvertising.
16. ForthedatainExercise16ofSection10.2"TheLinearCorrelationCoefficient"findtheproportionofthe
variabilityinadultheightthatisexplainedbythevariationinlengthatagetwo.
17. ForthedatainExercise17ofSection10.2"TheLinearCorrelationCoefficient"computethecoefficientof
determinationandinterpretitsvalueinthecontextofcourseaveragebeforethefinalexamandscoreonthe
finalexam.
18. ForthedatainExercise18ofSection10.2"TheLinearCorrelationCoefficient"computethecoefficientof
determinationandinterpretitsvalueinthecontextofacresplantedandacresharvested.
19. ForthedatainExercise19ofSection10.2"TheLinearCorrelationCoefficient"computethecoefficientof
determinationandinterpretitsvalueinthecontextoftheamountofthemedicationconsumedandblood
concentrationoftheactiveingredient.
20. ForthedatainExercise20ofSection10.2"TheLinearCorrelationCoefficient"computethecoefficientof
determinationandinterpretitsvalueinthecontextoftreesizeandage.
21. ForthedatainExercise21ofSection10.2"TheLinearCorrelationCoefficient"findtheproportionofthe
variabilityin28-daystrengthofconcretethatisaccountedforbyvariationin3-daystrength.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 578/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
578
22. ForthedatainExercise22ofSection10.2"TheLinearCorrelationCoefficient"findtheproportionofthe
variabilityinenergydemandthatisaccountedforbyvariationinaveragetemperature.
L A R G E D A T A S E T E X E R C I S E S
23. LargeDataSet1liststheSATscoresandGPAsof1,000students.Computethecoefficientofdetermination
andinterpretitsvalueinthecontextofSATscoresandGPAs.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
24. LargeDataSet12liststhegolfscoresononeroundofgolffor75golfersfirstusingtheirownoriginalclubs,
thenusingclubsofanew,experimentaldesign(aftertwomonthsoffamiliarizationwiththenewclubs).
Computethecoefficientofdeterminationandinterpretitsvalueinthecontextofgolfscoreswiththetwo
kindsofgolfclubs.
http://www.flatworldknowledge.com/sites/all/files/data12.xls
25. LargeDataSet13recordsthenumberofbiddersandsalespriceofaparticulartypeofantiquegrandfather
clockat60auctions.Computethecoefficientofdeterminationandinterpretitsvalueinthecontextofthe
numberofbiddersatanauctionandthepriceofthistypeofantiquegrandfatherclock.
http://www.flatworldknowledge.com/sites/all/files/data13.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 579/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
579
10.7EstimationandPrediction
L E A R N I N G O B J E C T I V E S
1. Tolearnthedistinctionbetweenestimationandprediction.
2. Tolearnthedistinctionbetweenaconfidenceintervalandapredictioninterval.
3. Tolearnhowtoimplementformulasforcomputingconfidenceintervalsandpredictionintervals.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 580/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
580
Consider the following pairs of problems, in the context of Note 10.19 "Example 3" in Section 10.4
"The Least Squares Regression Line", the automobile age and value example.
1. 1. Estimate the average value of all four-year-old automobiles of this make and
model.2. Construct a 95% confidence interval for the average value of all four-year-old
automobiles of this make and model.
2. 1. Shylock intends to buy a four-year-old automobile of this make and model next
week. Predict the value of the first such automobile that he encounters.
2. Construct a 95% confidence interval for the value of the first such automobile
that he encounters.
The method of solution and answer to the first question in each pair, (1a) and (2a), are thesame. When we set x equal to 4 in the least squares regression equation yˆ=−2.05 x+32.83 that
was computed in part (c) of Note 10.19 "Example 3" in Section 10.4 "The Least Squares
Regression Line", the number returned,
yˆ=−2.05(4)+32.83=24.63
which corresponds to value $24,630, is an estimate of precisely the number sought in
question (1a): the mean E ( y) of all y values when x = 4. Since nothing is known about the first
four-year-old automobile of this make and model that Shylock will encounter, our best guess
as to its value is the mean value E ( y) of all such automobiles, the number 24.63 or $24,630,
computed in the same way.
The answers to the second part of each question differ. In question (1b) we are trying to
estimate a population parameter: the mean of the all the y-values in the sub-population
picked out by the value x = 4, that is, the average value of all four-year-old automobiles. In
question (2b), however, we are not trying to capture a fixed parameter, but the value of the
random variable y in one trial of an experiment: examine the first four-year-old car Shylock
encounters. In the first case we seek to construct a confidence interval in the same sense that
we have done before. In the second case the situation is different, and the interval
constructed has a different name, prediction interval. In the second case we are trying to
“predict” where a the value of a random variable will take its value.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 581/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
581
a. x p is a particular value of x that lies in the range of x -values in the data set used to construct the
least squares regression line;
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 582/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
582
b. y ̂p is the numerical value obtained when the least square regression equation is evaluated at x= x p;
and
c. the number of degrees of freedom for t α/2 is df =n−2.
The assumptions listed in Section 10.3 "Modelling Linear Relationships with Randomness Present" must
hold.
E X A M P L E 1 2
UsingthesampledataofNote10.19"Example3"inSection10.4"TheLeastSquaresRegressionLine",
recordedinTable10.3"DataonAgeandValueofUsedAutomobilesofaSpecificMakeandModel",
constructa95%confidenceintervalfortheaveragevalueofallthree-and-one-half-year-oldautomobiles
ofthismakeandmodel.
Solution:
Solvingthisproblemismerelyamatteroffindingthevaluesof y ̂p,αandt α/2,sε, x −,and SS xx andinserting
themintotheconfidenceintervalformulagivenjustabove.Mostofthesequantitiesarealreadyknown.
FromNote10.19"Example3"inSection10.4"TheLeastSquaresRegression
Line", SS xx =14and x −=4.FromNote10.31"Example7"inSection10.5"StatisticalInferencesAbout
",sε=1.902169814.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 583/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
583
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 584/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
584
K E Y T A K E A W A Y S
• Aconfidenceintervalisusedtoestimatethemeanvalueofy inthesub-populationdeterminedbythe
conditionthat x havesomespecificvalue x p.
• Thepredictionintervalisusedtopredictthevaluethattherandomvariabley willtakewhen x hassome
specificvalue x p.
E X E R C I S E S
B A S I C
FortheBasicandApplicationexercisesinthissectionusethecomputationsthatweredoneforthe
exerciseswiththesamenumberinprevioussections.
1. ForthesampledatasetofExercise1ofSection10.2"TheLinearCorrelationCoefficient"
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 585/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
585
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =4.
b. Constructthe90%confidenceintervalforthatmeanvalue.
2. ForthesampledatasetofExercise2ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =4.
b. Constructthe90%confidenceintervalforthatmeanvalue.
3. ForthesampledatasetofExercise3ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =7.
b. Constructthe95%confidenceintervalforthatmeanvalue.
4. ForthesampledatasetofExercise4ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =2.
b. Constructthe80%confidenceintervalforthatmeanvalue.
5. ForthesampledatasetofExercise5ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =1.
b. Constructthe80%confidenceintervalforthatmeanvalue.
6. ForthesampledatasetofExercise6ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =5.
b. Constructthe95%confidenceintervalforthatmeanvalue.
7. ForthesampledatasetofExercise7ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =6.
b. Constructthe99%confidenceintervalforthatmeanvalue.
c. Isitvalidtomakethesameestimatesfor x =12?Explain.
8. ForthesampledatasetofExercise8ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =12.
b. Constructthe80%confidenceintervalforthatmeanvalue.
c. Isitvalidtomakethesameestimatesfor x =0?Explain.
9. ForthesampledatasetofExercise9ofSection10.2"TheLinearCorrelationCoefficient"
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 586/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
586
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =0.
b. Constructthe90%confidenceintervalforthatmeanvalue.
c. Isitvalidtomakethesameestimatesfor x=−1?Explain.
10. ForthesampledatasetofExercise9ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthemeanvalueofy inthesub-populationdeterminedbythe
condition x =8.
b. Constructthe95%confidenceintervalforthatmeanvalue.
c. Isitvalidtomakethesameestimatesfor x =0?Explain.
A P P L I C A T I O N S
11. ForthedatainExercise11ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimatefortheaveragenumberofwordsinthevocabularyof18-month-old
children.b. Constructthe95%confidenceintervalforthatmeanvalue.
c. Isitvalidtomakethesameestimatesfortwo-year-olds?Explain.
12. ForthedatainExercise12ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimatefortheaveragebrakingdistanceofautomobilesthatweigh3,250pounds.
b. Constructthe80%confidenceintervalforthatmeanvalue.
c. Isitvalidtomakethesameestimatesfor5,000-poundautomobiles?Explain.
13. ForthedatainExercise13ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimatefortherestingheartrateofamanwhois35yearsold.
b. Oneofthemeninthesampleis35yearsold,buthisrestingheartrateisnotwhatyoucomputed
inpart(a).Explainwhythisisnotacontradiction.
c. Constructthe90%confidenceintervalforthemeanrestingheartrateofall35-year-oldmen.
14. ForthedatainExercise14ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthewaveheightwhenthewindspeedis13milesperhour.
b. Oneofthewindspeedsinthesampleis13milesperhour,buttheheightofwavesthatdayis
notwhatyoucomputedinpart(a).Explainwhythisisnotacontradiction.
c. Constructthe90%confidenceintervalforthemeanwaveheightondayswhenthewindspeedis
13milesperhour.
15. ForthedatainExercise15ofSection10.2"TheLinearCorrelationCoefficient"
a. Thebusinessownerintendstospend$2,500onadvertisingnextyear.Giveanestimateofnext
year’srevenuebasedonthisfact.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 587/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
587
b. Constructthe90%predictionintervalfornextyear’srevenue,basedontheintenttospend
$2,500onadvertising.
16. ForthedatainExercise16ofSection10.2"TheLinearCorrelationCoefficient"
a. Atwo-year-oldgirlis32.3incheslong.Predictheradultheight.
b. Constructthe95%predictionintervalforthegirl’sadultheight.
17. ForthedatainExercise17ofSection10.2"TheLinearCorrelationCoefficient"
a. Lodovicohasa78.6averageinhisphysicsclassjustbeforethefinal.Giveapointestimateof
whathisfinalexamgradewillbe.
b. Explainwhetheranintervalestimateforthisproblemisaconfidenceintervaloraprediction
interval.
c. Basedonyouranswerto(b),constructanintervalestimateforLodovico’sfinalexamgradeatthe
90%levelofconfidence.
18. ForthedatainExercise18ofSection10.2"TheLinearCorrelationCoefficient"
a. Thisyear86.2millionacresofcornwereplanted.Giveapointestimateofthenumberofacres
thatwillbeharvestedthisyear.
b. Explainwhetheranintervalestimateforthisproblemisaconfidenceintervaloraprediction
interval.
c. Basedonyouranswerto(b),constructanintervalestimateforthenumberofacresthatwillbe
harvestedthisyear,atthe99%levelofconfidence.
19. ForthedatainExercise19ofSection10.2"TheLinearCorrelationCoefficient"
a. Giveapointestimateforthebloodconcentrationoftheactiveingredientofthismedicationina
manwhohasconsumed1.5ouncesofthemedicationjustrecently.
b. Gratianojustconsumed1.5ouncesofthismedication30minutesago.Constructa95%
predictionintervalfortheconcentrationoftheactiveingredientinhisbloodrightnow.
20. ForthedatainExercise20ofSection10.2"TheLinearCorrelationCoefficient"
a. Youmeasurethegirthofafree-standingoaktreefivefeetoffthegroundandobtainthevalue
127inches.Howolddoyouestimatethetreetobe?
b. Constructa90%predictionintervalfortheageofthistree.
21. ForthedatainExercise21ofSection10.2"TheLinearCorrelationCoefficient"
a. Atestcylinderofconcretethreedaysoldfailsat1,750psi.Predictwhatthe28-daystrengthof
theconcretewillbe.
b. Constructa99%predictionintervalforthe28-daystrengthofthisconcrete.
c. Basedonyouranswerto(b),whatwouldbetheminimum28-daystrengthyoucouldexpectthis
concretetoexhibit?
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 588/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
588
22. ForthedatainExercise22ofSection10.2"TheLinearCorrelationCoefficient"
a. Tomorrow’saveragetemperatureisforecasttobe53degrees.Estimatetheenergydemand
tomorrow.
b. Constructa99%predictionintervalfortheenergydemandtomorrow.
c. Basedonyouranswerto(b),whatwouldbetheminimumdemandyoucouldexpect?
L A R G E D A T A S E T E X E R C I S E S
23. LargeDataSet1liststheSATscoresandGPAsof1,000students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
a. GiveapointestimateofthemeanGPAofallstudentswhoscore1350ontheSAT.
b. Constructa90%confidenceintervalforthemeanGPAofallstudentswhoscore1350onthe
SAT.
24.
LargeDataSet12liststhegolfscoresononeroundofgolffor75golfersfirstusingtheirownoriginalclubs,thenusingclubsofanew,experimentaldesign(aftertwomonthsoffamiliarizationwiththenewclubs).
http://www.flatworldknowledge.com/sites/all/files/data12.xls
a. Thurioaverages72strokesperroundwithhisownclubs.Giveapointestimateforhisscoreon
oneroundifheswitchestothenewclubs.
b. Explainwhetheranintervalestimateforthisproblemisaconfidenceintervaloraprediction
interval.
c. Basedonyouranswerto(b),constructanintervalestimateforThurio’sscoreononeroundifhe
switchestothenewclubs,at90%confidence.
25. LargeDataSet13recordsthenumberofbiddersandsalespriceofaparticulartypeofantiquegrandfather
clockat60auctions.
http://www.flatworldknowledge.com/sites/all/files/data13.xls
a. TherearesevenlikelybiddersattheVeronaauctiontoday.Giveapointestimateforthepriceof
suchaclockattoday’sauction.
b. Explainwhetheranintervalestimateforthisproblemisaconfidenceintervaloraprediction
interval.
c. Basedonyouranswerto(b),constructanintervalestimateforthelikelysalepriceofsuchaclock
attoday’ssale,at95%confidence.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 589/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
589
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 590/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
590
10.8ACompleteExample
L E A R N I N G O B J E C T I V E
1. Toseeacompletelinearcorrelationandregressionanalysis,inapracticalsetting,asacohesivewhole.
In the preceding sections numerous concepts were introduced and illustrated, but the analysis was
broken into disjoint pieces by sections. In this section we will go through a complete example of the
use of correlation and regression analysis of data from start to finish, touching on all the topics of
this chapter in sequence.
In general educators are convinced that, all other factors being equal, class attendance has a
significant bearing on course performance. To investigate the relationship between attendance and
performance, an education researcher selects for study a multiple section introductory statistics
course at a large university. Instructors in the course agree to keep an accurate record of attendance
throughout one semester. At the end of the semester 26 students are selected a random. For each
student in the sample two measurements are taken: x , the number of days the student was absent,
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 591/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
591
andy, the student’s score on the common final exam in the course. The data are summarized in Table
10.4 "Absence and Score Data".
Table 10.4 Absence and Score DataAbsences Score Absences Score
x y x y
2 76 4 41
7 29 5 63
2 96 4 88
7 63 0 98
2 79 1 99
7 71 0 89
0 88 1 96
0 92 3 90
6 55 1 90
6 70 3 68
2 80 1 84
2 75 3 80
1 63 1 78
A scatter plot of the data is given in Figure 10.13 "Plot of the Absence and Exam Score Pairs". There
is a downward trend in the plot which indicates that on average students with more absences tend to
do worse on the final examination.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 592/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
592
Figure 10.13 Plot of the Absence and Exam Score Pairs
The trend observed in Figure 10.13 "Plot of the Absence and Exam Score Pairs" as well as the fairly
constant width of the apparent band of points in the plot makes it reasonable to assume a
relationship between x and y of the form
y= β 1 x + β 0+ε
where β 1 and β 0 are unknown parameters and is a normal random variable with mean zero and
unknown standard deviation . Note carefully that this model is being proposed for the population
of all students taking this course, not just those taking it this semester, and certainly not just those in
the sample. The numbers β 1, β 0, and are parameters relating to this large population.
First we perform preliminary computations that will be needed later. The data are processed in Table
10.5 "Processed Absence and Score Data".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 593/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
593
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 594/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
594
The statistic sε estimates the standard deviation of the normal random variable in the model. Its
meaning is that among all students with the same number of absences, the standard deviation of
their scores on the final exam is about 12.1 points. Such a large value on a 100-point exam means
that the final exam scores of each sub-population of students, based on the number of absences, are
highly variable.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 595/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
595
The size and sign of the slope β ̂ 1=−5.23 indicate that, for every class missed, students tend to score
about 5.23 fewer points lower on the final exam on average. Similarly for every two classes missed
students tend to score on average 2×5.23=10.46 fewer points on the final exam, or about a letter grade
worse on average.
Since 0 is in the range of x -values in the data set, the y-intercept also has meaning in this problem. It
is an estimate of the average grade on the final exam of all students who have perfect attendance. The
predicted average of such students is β ̂ 0=91.24.
Before we use the regression equation further, or perform other analyses, it would be a good idea to
examine the utility of the linear regression model. We can do this in two ways: 1) by computing the
correlation coefficient r to see how strongly the number of absences x and the score y on the final
exam are correlated, and 2) by testing the null hypothesis H 0: β 1=0 (the slope of
the population regression line is zero, so x is not a good predictor of y) against the natural
alternative H a: β 1<0 (the slope of the population regression line is negative, so final exam scores y go
down as absences x go up).
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 596/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
596
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 597/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
597
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 598/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
598
or about 49%. Thus although there is a significant correlation between attendance and performance
on the final exam, and we can estimate with fair accuracy the average score of students who miss a
certain number of classes, nevertheless less than half the total variation of the exam scores in the
sample is explained by the number of absences. This should not come as a surprise, since there are
many factors besides attendance that bear on student performance on exams.
K E Y T A K E A W A Y
• Itisagoodideatoattendclass.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 599/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
599
E X E R C I S E S
Theexercisesinthissectionareunrelatedtothoseinprevioussections.
1. Thedatagivetheamount x ofsilicofluorideinthewater(mg/L)andtheamounty ofleadinthebloodstream
( μg/dL)oftenchildreninvariouscommunitieswithandwithoutmunicipalwater.Performacomplete
analysisofthedata,inanalogywiththediscussioninthissection(thatis,makeascatterplot,dopreliminary
computations,findtheleastsquaresregressionline,findSSE , sε,andr ,andsoon).Inthehypothesistestuse
asthealternativehypothesis β 1>0,andtestatthe5%levelofsignificance.Useconfidencelevel95%forthe
confidenceintervalfor β 1.Construct95%confidenceandpredictionsintervalsat x p=2attheend.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 600/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
600
http://www.flatworldknowledge.com/sites/all/files/data3.xls
http://www.flatworldknowledge.com/sites/all/files/data3A.xls
4. SeparateoutfromLargeDataSet3Ajustthedataonmenanddoacompleteanalysis,withshoesizeas
theindependentvariable( x )andheightasthedependentvariable(y ).Useα=0.05and x p=10whenever
appropriate.
http://www.flatworldknowledge.com/sites/all/files/data3A.xls
5. SeparateoutfromLargeDataSet3Ajustthedataonwomenanddoacompleteanalysis,withshoesize
astheindependentvariable( x )andheightasthedependentvariable(y ).Useα=0.05and x p=10whenever
appropriate.
http://www.flatworldknowledge.com/sites/all/files/data3A.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 601/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
601
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 602/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
602
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 603/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
603
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 604/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
604
Chapter11
Chi-SquareTestsandF -Tests
In previous chapters you saw how to test hypotheses concerning population means and population
proportions. The idea of testing hypotheses can be extended to many other situations that involve
different parameters and use different test statistics. Whereas the standardized test statistics that
appeared in earlier chapters followed either a normal or Student t -distribution, in this chapter the
tests will involve two other very common and useful distributions, the chi-square and the F -
distributions. The chi-square distribution arises in tests of hypotheses concerning the
independence of two random variables and concerning whether a discrete random variable follows a
specified distribution. The F-distribution arises in tests of hypotheses concerning whether or nottwo population variances are equal and concerning whether or not three or more population means
are equal.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 605/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
605
11.1Chi-SquareTestsforIndependenceL E A R N I N G O B J E C T I V E S
1. Tounderstandwhatchi-squaredistributionsare.
2. Tounderstandhowtouseachi-squaretesttojudgewhethertwofactorsareindependent.
Chi-SquareDistributions
As you know, there is a whole family of t -distributions, each one specified by a parameter called
the degrees of freedom, denoted df . Similarly, all the chi-square distributions form a family, and each of
its members is also specified by a parameter df , the number of degrees of freedom. Chi is a Greek letter
denoted by the symbol χ and chi-square is often denoted by χ 2. Figure 11.1 "Many " shows several chi-square distributions for different degrees of freedom. A chi-square random variable is a random variable
that assumes only positive values and follows a chi-square distribution.
Figure 11.1 Many χ 2 Distributions
DefinitionThe value of the chi-square random variable χ 2 with df =k that cuts off a right tail of area c is
denoted χ 2c and is called a critical value. See Figure 11.2.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 606/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
606
Figure 11.2 χ 2c Illustrated
Figure 12.4 "Critical Values of Chi-Square Distributions" gives values of χ 2cfor various values of c and
under several chi-square distributions with various degrees of freedom.
TestsforIndependence
Hypotheses tests encountered earlier in the book had to do with how the numerical values of two
population parameters compared. In this subsection we will investigate hypotheses that have to do with
whether or not two random variables take their values independently, or whether the value of one has a
relation to the value of the other. Thus the hypotheses will be expressed in words, not mathematical
symbols. We build the discussion around the following example.
There is a theory that the gender of a baby in the womb is related to the baby’s heart rate: baby girls tend
to have higher heart rates. Suppose we wish to test this theory. We examine the heart rate records of 40
babies taken during their mothers’ last prenatal checkups before delivery, and to each of these 40
randomly selected records we compute the values of two random measures: 1) gender and 2) heart rate. In
this context these two random measures are often called factors. Since the burden of proof is that heart
rate and gender are related, not that they are unrelated, the problem of testing the theory on baby gender
and heart rate can be formulated as a test of the following hypotheses:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 607/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
607
H O: Baby gender and baby heart rate are independent
vs. H a: Baby gender and baby heart rate are not independent
The factor gender has two natural categories or levels: boy and girl. We divide the second factor,heart rate, into two levels, low and high, by choosing some heart rate, say 145 beats per minute, as
the cutoff between them. A heart rate below 145 beats per minute will be considered low and 145 and
above considered high. The 40 records give rise to a 2 × 2contingency table. By adjoining row totals,
column totals, and a grand total we obtain the table shown as Table 11.1 "Baby Gender and Heart
Rate". The four entries in boldface type are counts of observations from the sample of n= 40. There
were 11 girls with low heart rate, 17 boys with low heart rate, and so on. They form the core of the
expanded table.
Table 11.1 Baby Gender and Heart Rate
Heart Rate
Low High Row Total
Gender
Girl 11 7 18
Boy 17 5 22
Column Total 28 12 Total = 40
In analogy with the fact that the probability of independent events is the product of the probabilities
of each event, if heart rate and gender were independent then we would expect the number in each
core cell to be close to the product of the row total R and column total C of the row and column
containing it, divided by the sample size n. Denoting such an expected number of observations E ,
these four expected values are:
• 1st row and 1st column: E =( R×C )/n=18×28/40=12.6
• 1st row and 2nd column: E =( R×C )/n=18×12/40=5.4
• 2nd row and 1st column: E =( R×C )/n=22×28/40=15.4
• 2nd row and 2nd column: E =( R×C )/n=22×12/40=6.6
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 608/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
608
We update Table 11.1 "Baby Gender and Heart Rate" by placing each expected value in its
corresponding core cell, right under the observed value in the cell. This gives the updated table Table
11.2 "Updated Baby Gender and Heart Rate".
Table 11.2 Updated Baby Gender and Heart Rate
HeartRate
Low High RowTotal
Gender
Girl O=11 E =12.6 O=7 E =5.4 R=18
Boy O=17 E =15.4 O=5 E =6.6 R=22
ColumnTotal C =28 C =12 n=40
A measure of how much the data deviate from what we would expect to see if the factors really were
independent is the sum of the squares of the difference of the numbers in each core cell, or, standardizing
by dividing each square by the expected number in the cell, the sum Σ(O− E )2/ E . We would reject the null
hypothesis that the factors are independent only if this number is large, so the test is right-tailed. In this
example the random variable Σ(O− E )2/ E has the chi-square distribution with one degree of freedom. If wehad decided at the outset to test at the 10% level of significance, the critical value defining the rejection
region would be, reading from Figure 12.4 "Critical Values of Chi-Square Distributions", χ 2α= χ 20.10=2.706,
so that the rejection region would be the interval [2.706,∞). When we compute the value of the
standardized test statistic we obtain
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 609/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
609
As in the example each factor is divided into a number of categories or levels. These could arise
naturally, as in the boy-girl division of gender, or somewhat arbitrarily, as in the high-low division of
heart rate. Suppose Factor 1 has I levels and Factor 2 has J levels. Then the information from a
random sample gives rise to a general I × J contingency table, which with row totals, column totals,
and a grand total would appear as shown in Table 11.3 "General Contingency Table". Each cell may
be labeled by a pair of indices (i, j). Oij stands for the observed count of observations in the cell in
row i and column j , Ri for the ith row total and C j for the jth column total. To simplify the notation we
will drop the indices so Table 11.3 "General Contingency Table" becomes Table 11.4 "Simplified
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 610/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
610
General Contingency Table". Nevertheless it is important to keep in mind that the Os, the Rs and
the C s, though denoted by the same symbols, are in fact different numbers.
Table 11.3 General Contingency Table
Factor2Levels
1 ⋅ ⋅ ⋅ j ⋅ ⋅ ⋅ J RowTotal
Factor1Levels
1 O11 ⋅ ⋅ ⋅ O1 j ⋅ ⋅ ⋅ O1 J R1
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
i Oi1 ⋅ ⋅ ⋅ Oij ⋅ ⋅ ⋅ OiJ Ri
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
I O I 1 ⋅ ⋅ ⋅ O Ij ⋅ ⋅ ⋅ O IJ RI
ColumnTotal C 1 ⋅ ⋅ ⋅ C j ⋅ ⋅ ⋅ C J n
Table 11.4 Simplified General Contingency Table
Factor2Levels
1 ⋅ ⋅ ⋅ j ⋅ ⋅ ⋅ J RowTotal
Factor1Levels
1 O ⋅ ⋅ ⋅ O ⋅ ⋅ ⋅ O R
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
i O ⋅ ⋅ ⋅ O ⋅ ⋅ ⋅ O R
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
I O ⋅ ⋅ ⋅ O ⋅ ⋅ ⋅ O R
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 611/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
611
Factor2Levels
1 ⋅ ⋅ ⋅ j ⋅ ⋅ ⋅ J RowTotal
ColumnTotal C ⋅ ⋅ ⋅ C ⋅ ⋅ ⋅ C n
As in the example, for each core cell in the table we compute what would be the expected number E of
observations if the two factors were independent. E is computed for each core cell (each cell with
an O in it) of Table 11.4 "Simplified General Contingency Table" by the rule applied in the example:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 612/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
612
E X A M P L E 1
Aresearcherwishestoinvestigatewhetherstudents’scoresonacollegeentranceexamination(CEE)
haveanyindicativepowerforfuturecollegeperformanceasmeasuredbyGPA.Inotherwords,he
wishestoinvestigatewhetherthefactorsCEEandGPAareindependentornot.Herandomly
selectsn=100studentsinacollegeandnoteseachstudent’sscoreontheentranceexaminationand
hisgradepointaverageattheendofthesophomoreyear.Hedividesentranceexamscoresintotwo
levelsandgradepointaveragesintothreelevels.Sortingthedataaccordingtothesedivisions,he
formsthecontingencytableshownas Table11.6"CEEversusGPAContingencyTable" ,inwhichthe
rowandcolumntotalshavealreadybeencomputed.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 613/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
613
T A B L E 1 1 . 6 C E E V E R S U S G P A C O N T I N G E N C Y T A B L E
GPA
<2.7 2.7to3.2 >3.2 RowTotal
CEE
<1800 35 12 5 52
≥1800 6 24 18 48
ColumnTotal 41 36 23 Total=100
Test,atthe1%levelofsignificance,whetherthesedataprovidesufficientevidencetoconcludethat
CEEscoresindicatefutureperformancelevelsofincomingcollegefreshmenasmeasuredbyGPA.
Solution:
Weperformthetestusingthecriticalvalueapproach,followingtheusualfive-stepmethodoutlined
attheendofSection8.1"TheElementsofHypothesisTesting" inChapter8"TestingHypotheses" .
• Step1.Thehypothesesare
H 0: CEE and GPA are independent factors
vs. H a: CEE and GPA are not independent factors
• Step2.Thedistributionischi-square.
• Step3.Tocomputethevalueoftheteststatisticwemustfirstcomputedtheexpectednumberfor
eachofthesixcorecells(theoneswhoseentriesareboldface):
o 1strowand1stcolumn: E =( R×C )/n=41×52/100=21.32
o 1strowand2ndcolumn: E =( R×C )/n=36×52/100=18.72
o 1strowand3rdcolumn: E =( R×C )/n=23×52/100=11.96
o 2ndrowand1stcolumn: E =( R×C )/n=41×48/100=19.68
o 2ndrowand2ndcolumn: E =( R
×C )/n=36
×48/100=17.28
o 2ndrowand3rdcolumn: E =( R×C )/n=23×48/100=11.04
Table11.6"CEEversusGPAContingencyTable"isupdatedtoTable11.7"UpdatedCEEversusGPA
ContingencyTable".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 614/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
614
• Step5.Since31.75>9.21thedecisionistorejectthenullhypothesis.SeeFigure11.4.Thedataprovide
sufficientevidence,atthe1%levelofsignificance,toconcludethatCEEscoreandGPAarenot
independent:theentranceexamscorehaspredictivepower.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 615/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
615
Figure11.4Note11.9"Example1"
K E Y T A K E A W A Y S
• Criticalvaluesofachi-squaredistributionwithdegreesoffreedomdf arefoundinFigure12.4"Critical
ValuesofChi-SquareDistributions".
• Achi-squaretestcanbeusedtoevaluatethehypothesisthattworandomvariablesorfactorsare
independent.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 616/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
616
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 617/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
617
Factor 1
Level 1 Level 2 Row Total
Factor 2
Level 1 20 10 R
Level 2 15 5 R
Level 3 10 20 R
Column Total C C n
a. Findthecolumntotals,therowtotals,andthegrandtotal,n,ofthetable.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 618/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
618
b. FindtheexpectednumberE ofobservationsforeachcellbasedontheassumptionthatthetwofactorsare
independent(thatis,justusetheformula E =( R×C )/n).
c. Findthevalueofthechi-squareteststatistic χ 2.
d. Findthenumberofdegreesoffreedomofthechi-squareteststatistic.
A P P L I C A T I O N S
9. Achildpsychologistbelievesthatchildrenperformbetterontestswhentheyaregivenperceivedfreedomof
choice.Totestthisbelief,thepsychologistcarriedoutanexperimentinwhich200thirdgraderswere
randomlyassignedtotwogroups, AandB.Eachchildwasgiventhesamesimplelogictest.Howeverin
groupB,eachchildwasgiventhefreedomtochooseatextbookletfrommanywithvariousdrawingsonthe
covers.TheperformanceofeachchildwasratedasVeryGood,Good,andFair.Theresultsaresummarizedin
thetableprovided.Test,atthe5%levelofsignificance,whetherthereissufficientevidenceinthedatato
supportthepsychologist’sbelief.
Group
A B
Performance
Very Good 32 29
Good 55 61
Fair 10 13
10. Inregardtowinetastingcompetitions,manyexpertsclaimthatthefirstglassofwineservedsetsareference
tasteandthatadifferentreferencewinemayaltertherelativerankingoftheotherwinesincompetition.To
testthisclaim,threewines, A,BandC ,wereservedatawinetastingevent.Eachpersonwasservedasingle
glassofeachwine,butindifferentordersfordifferentguests.Attheclose,eachpersonwasaskedtoname
thebestofthethree.Onehundredseventy-twopeoplewereattheeventandtheirtoppicksaregiveninthe
tableprovided.Test,atthe1%levelofsignificance,whetherthereissufficientevidenceinthedatato
supporttheclaimthatwineexperts’preferenceisdependentonthefirstservedwine.
Top Pick
A B C
First Glass
A 12 31 27
B 15 40 21
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 619/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
619
Top Pick
A B C
C 10 9 7
11. Isbeingleft-handedhereditary?Toanswerthisquestion,250adultsarerandomlyselectedandtheir
handednessandtheirparents’handednessarenoted.Theresultsaresummarizedinthetableprovided.Test,
atthe1%levelofsignificance,whetherthereissufficientevidenceinthedatatoconcludethatthereisa
hereditaryelementinhandedness.
Number of Parents Left-Handed
0 1 2
Handedness
Left 8 10 12
Right 178 21 21
12. Somegeneticistsclaimthatthegenesthatdetermineleft-handednessalsogoverndevelopmentofthe
languagecentersofthebrain.Ifthisclaimistrue,thenitwouldbereasonabletoexpectthatleft-handed
peopletendtohavestrongerlanguageabilities.Astudydesignedtotextthisclaimrandomlyselected807
studentswhotooktheGraduateRecordExamination(GRE).Theirscoresonthelanguageportionofthe
examinationwereclassifiedintothreecategories:low ,average,andhigh,andtheirhandednesswasalso
noted.Theresultsaregiveninthetableprovided.Test,atthe5%levelofsignificance,whetherthereis
sufficientevidenceinthedatatoconcludethatleft-handedpeopletendtohavestrongerlanguageabilities.
GRE English Scores
Low Average High
Handedness
Left 18 40 22
Right 201 360 166
13. Itisgenerallybelievedthatchildrenbroughtupinstablefamiliestendtodowellinschool.Toverifysucha
belief,asocialscientistexamined290randomlyselectedstudents’recordsinapublichighschoolandnoted
eachstudent’sfamilystructureandacademicstatusfouryearsafterenteringhighschool.Thedatawere
thensortedintoa2×3contingencytablewithtwofactors.Factor1hastwolevels:graduated anddidnot
graduate.Factor2hasthreelevels:noparent ,oneparent ,andtwoparents.Theresultsaregiveninthetable
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 620/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
620
provided.Test,atthe1%levelofsignificance,whetherthereissufficientevidenceinthedatatoconclude
thatfamilystructuremattersinschoolperformanceofthestudents.
Academic Status
Graduated Did Not Graduate
Family
No parent 18 31
One parent 101 44
Two parents 70 26
14. Alargemiddleschooladministratorwishestousecelebrityinfluencetoencouragestudentstomake
healthierchoicesintheschoolcafeteria.Thecafeteriaissituatedatthecenterofanopenspace.Everydayat
lunchtimestudentsgettheirlunchandadrinkinthreeseparatelinesleadingtothreeseparateserving
stations.Asanexperiment,theschooladministratordisplayedaposterofapopularteenpopstardrinking
milkateachofthethreeareaswheredrinksareprovided,exceptthemilkintheposterisdifferentateach
location:oneshowswhitemilk,oneshowsstrawberry-flavoredpinkmilk,andoneshowschocolatemilk.
Afterthefirstdayoftheexperimenttheadministratornotedthestudents’milkchoicesseparatelyforthe
threelines.Thedataaregiveninthetableprovided.Test,atthe1%levelofsignificance,whetherthereis
sufficientevidenceinthedatatoconcludethatthepostershadsomeimpactonthestudents’drinkchoices.
Student Choice
Regular Strawberry Chocolate
Poster Choice
Regular 38 28 40
Strawberry 18 51 24
Chocolate 32 32 53
L A R G E D A T A S E T E X E R C I S E
15. LargeDataSet8recordstheresultofasurveyof300randomlyselectedadultswhogotomovietheaters
regularly.Foreachpersonthegenderandpreferredtypeofmoviewererecorded.Test,atthe5%levelof
significance,whetherthereissufficientevidenceinthedatatoconcludethatthefactors“gender”and
“preferredtypeofmovie”aredependent.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 621/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
621
http://www.flatworldknowledge.com/sites/all/files/data8.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 622/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
622
11.2Chi-SquareOne-SampleGoodness-of-FitTests
L E A R N I N G O B J E C T I V E
1. Tounderstandhowtouseachi-squaretesttojudgewhetherasamplefitsaparticularpopulationwell.
Suppose we wish to determine if an ordinary-looking six-sided die is fair, or balanced, meaning that
every face has probability 1/6 of landing on top when the die is tossed. We could toss the die dozens,
maybe hundreds, of times and compare the actual number of times each face landed on top to the
expected number, which would be 1/6 of the total number of tosses. We wouldn’t expect each
number to be exactly 1/6 of the total, but it should be close. To be specific, suppose the die is
tossed n = 60 times with the results summarized in Table 11.8 "Die Contingency Table". For ease of
reference we add a column of expected frequencies, which in this simple example is simply a column
of 10s. The result is shown as Table 11.9 "Updated Die Contingency Table". In analogy with the
previous section we call this an “updated” table. A measure of how much the data deviate from what
we would expect to see if the die really were fair is the sum of the squares of the differences between
the observed frequency O and the expected frequency E in each row, or, standardizing by dividing
each square by the expected number, the sum Σ(O− E )2/ E . If we formulate the investigation as a test of
hypotheses, the test is
H 0: The die is fair
vs. H a: The die is not fair
Table 11.8 Die Contingency Table
Die Value Assumed Distribution Observed Frequency
1 1/6 9
2 1/6 15
3 1/6 9
4 1/6 8
5 1/6 6
6 1/6 13
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 623/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
623
Table 11.9 Updated Die Contingency Table
Die Value Assumed Distribution Observed Freq. Expected Freq.
1 1/6 9 10
2 1/6 15 10
3 1/6 9 10
4 1/6 8 10
5 1/6 6 10
6 1/6 13 10
We would reject the null hypothesis that the die is fair only if the number Σ(O− E )2/ E is large, so the test is
right-tailed. In this example the random variable Σ(O− E )2/ E has the chi-square distribution with five
degrees of freedom. If we had decided at the outset to test at the 10% level of significance, the critical
value defining the rejection region would be, reading from Figure 12.4 "Critical Values of Chi-Square
Distributions", χ 2α= χ 20.10=9.236, so that the rejection region would be the interval [9.236,∞). When we
compute the value of the standardized test statistic using the numbers in the last two columns of Table
11.9 "Updated Die Contingency Table", we obtain
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 624/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
624
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 625/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
625
Table 11.10 General Contingency Table
FactorLevels AssumedDistribution ObservedFrequency
1 p1 O1
2 p2 O2
⋮ ⋮ ⋮
I pI OI
Table 11.10 "General Contingency Table" is updated to Table 11.11 "Updated General Contingency
Table" by adding the expected frequency for each value of X . To simplify the notation we drop indices
for the observed and expected frequencies and represent Table 11.11 "Updated General Contingency
Table" by Table 11.12 "Simplified Updated General Contingency Table".
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 626/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
626
Table 11.11 Updated General Contingency Table
FactorLevels AssumedDistribution ObservedFreq. ExpectedFreq.
1 p1 O1 E 1
2 p2 O2 E 2
⋮ ⋮ ⋮ ⋮
I pI OI E I
Table 11.12 Simplified Updated General Contingency Table
FactorLevels AssumedDistribution ObservedFreq. ExpectedFreq.
1 p1 O E
2 p2 O E
⋮ ⋮ ⋮ ⋮
I pI O E
Here is the test statistic for the general hypothesis based on Table 11.12 "Simplified Updated General
Contingency Table", together with the conditions that it follow a chi-square distribution.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 627/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
627
E X A M P L E 2
Table11.13"EthnicGroupsintheCensusYear" showsthedistributionofvariousethnicgroupsinthe
populationofaparticularstatebasedonadecennialU.S.census.Fiveyearslaterarandomsampleof
2,500residentsofthestatewastaken,withtheresultsgivenin Table11.14"SampleDataFiveYears
AftertheCensusYear"(alongwiththeprobabilitydistributionfromthecensusyear).Test,atthe1%
levelofsignificance,whetherthereissufficientevidenceinthesampletoconcludethatthe
distributionofethnicgroupsinthisstatefiveyearsafterthecensushadchangedfromthatinthe
censusyear.
T A B L E 1 1 . 1 3 E T H N I C G R O U P S I N T H E C E N S U S Y E A R
Ethnicity White Black Amer.-Indian Hispanic Asian Others
Proportion 0.743 0.216 0.012 0.012 0.008 0.009
T A B L E 1 1 . 1 4 S A M P L E D A T A F I V E Y E A R S A F T E R T H E C E N S U S Y E A R
Ethnicity Assumed Distribution Observed Frequency
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 628/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
628
Ethnicity Assumed Distribution Observed Frequency
White 0.743 1732
Black 0.216 538
American-Indian 0.012 32
Hispanic 0.012 42
Asian 0.008 133
Others 0.009 23
Solution:
Wetestusingthecriticalvalueapproach.
• Step1.Thehypothesesofinterestinthiscasecanbeexpressedas
H 0:The distribution
of ethnic
groups
has
not
changed
vs. H a: The distribution of ethnic groups has changed
• Step2.Thedistributionischi-square.
Step3.Tocomputethevalueoftheteststatisticwemustfirstcomputetheexpectednumberfor
eachrowofTable11.14"SampleDataFiveYearsAftertheCensusYear".Sincen=2500,usingthe
formula E i=n× piandthevaluesof pi fromeitherTable11.13"EthnicGroupsintheCensus
Year"orTable11.14"SampleDataFiveYearsAftertheCensusYear",
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 629/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
629
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 630/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
630
K E Y T A K E A W A Y
• Thechi-squaregoodness-of-fittest canbeusedtoevaluatethehypothesisthatasampleistakenfroma
populationwithanassumedspecificprobabilitydistribution.
E X E R C I S E S
B A S I C
1. Adatasampleissortedintofivecategorieswithanassumedprobabilitydistribution.
Factor Levels Assumed Distribution Observed Frequency
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 631/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
631
Factor Levels Assumed Distribution Observed Frequency
1 p1=0.1 10
2 p2=0.4 35
3 p3=0.4 45
4 p4=0.1 10
a. Findthesizenofthesample.
b. FindtheexpectednumberE ofobservationsforeachlevel,ifthesampledpopulationhasa
probabilitydistributionasassumed(thatis,justusetheformula E i=n× pi).
c. Findthechi-squareteststatistic χ 2.
d. Findthenumberofdegreesoffreedomofthechi-squareteststatistic.
2. Adatasampleissortedintofivecategorieswithanassumedprobabilitydistribution.
Factor Levels Assumed Distribution Observed Frequency
1 p1=0.3 23
2 p2=0.3 30
3 p3=0.2 19
4 p4=0.1 8
5 p5=0.1 10
a. Findthesizenofthesample.
b. FindtheexpectednumberE ofobservationsforeachlevel,ifthesampledpopulationhasa
probabilitydistributionasassumed(thatis,justusetheformula E i=n× pi).
c. Findthechi-squareteststatistic χ 2.
d. Findthenumberofdegreesoffreedomofthechi-squareteststatistic.
A P P L I C A T I O N S
3. Retailersofcollectiblepostagestampsoftenbuytheirstampsinlargequantitiesbyweightatauctions.The
pricestheretailersarewillingtopaydependonhowoldthepostagestampsare.Manycollectiblepostage
stampsatauctionsaredescribedbytheproportionsofstampsissuedatvariousperiodsinthepast.Generally
theolderthestampsthehigherthevalue.Atoneparticularauction,alotofcollectiblestampsisadvertised
tohavetheagedistributiongiveninthetableprovided.Aretailbuyertookasampleof73stampsfromthe
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 632/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
632
lotandsortedthembyage.Theresultsaregiveninthetableprovided.Test,atthe5%levelofsignificance,
whetherthereissufficientevidenceinthedatatoconcludethattheagedistributionofthelotisdifferent
fromwhatwasclaimedbytheseller.
Year Claimed Distribution Observed Frequency
Before 1940 0.10 6
1940 to 1959 0.25 15
1960 to 1979 0.45 30
After 1979 0.20 22
4. ThelittersizeofBengaltigersistypicallytwoorthreecubs,butitcanvarybetweenoneandfour.Basedon
long-termobservations,thelittersizeofBengaltigersinthewildhasthedistributiongiveninthetable
provided.AzoologistbelievesthatBengaltigersincaptivitytendtohavedifferent(possiblysmaller)litter
sizesfromthoseinthewild.Toverifythisbelief,thezoologistsearchedalldatasourcesandfound316litter
sizerecordsofBengaltigersincaptivity.Theresultsaregiveninthetableprovided.Test,atthe5%levelof
significance,whetherthereissufficientevidenceinthedatatoconcludethatthedistributionoflittersizesin
captivitydiffersfromthatinthewild.
Litter Size Wild Litter Distribution Observed Frequency
1 0.11 41
2 0.69 243
3 0.18 27
4 0.02 5
5. Anonlineshoeretailersellsmen’sshoesinsizes8to13.Inthepastordersforthedifferentshoesizes
havefollowedthedistributiongiveninthetableprovided.Themanagementbelievesthatrecent
marketingeffortsmayhaveexpandedtheircustomerbaseand,asaresult,theremaybeashiftinthesize
distributionforfutureorders.Tohaveabetterunderstandingofitsfuturesales,theshoesellerexamined1,040salesrecordsofrecentordersandnotedthesizesoftheshoesordered.Theresultsaregiveninthe
tableprovided.Test,atthe1%levelofsignificance,whetherthereissufficientevidenceinthedatato
concludethattheshoesizedistributionoffuturesaleswilldifferfromthehistoricone.
Shoe Size Past Size Distribution Recent Size Frequency
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 633/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
633
Shoe Size Past Size Distribution Recent Size Frequency
8.0 0.03 25
8.5 0.06 43
9.0 0.09 88
9.5 0.19 221
10.0 0.23 272
10.5 0.14 150
11.0 0.10 107
11.5 0.06 51
12.0 0.05 37
12.5 0.03 35
13.0 0.02 11
6. Anonlineshoeretailersellswomen’sshoesinsizes5to10.Inthepastordersforthedifferentshoesizes
havefollowedthedistributiongiveninthetableprovided.Themanagementbelievesthatrecentmarketing
effortsmayhaveexpandedtheircustomerbaseand,asaresult,theremaybeashiftinthesizedistribution
forfutureorders.Tohaveabetterunderstandingofitsfuturesales,theshoesellerexamined1,174sales
recordsofrecentordersandnotedthesizesoftheshoesordered.Theresultsaregiveninthetableprovided.
Test,atthe1%levelofsignificance,whetherthereissufficientevidenceinthedatatoconcludethattheshoe
sizedistributionoffuturesaleswilldifferfromthehistoricone.
Shoe Size Past Size Distribution Recent Size Frequency
5.0 0.02 20
5.5 0.03 23
6.0 0.07 88
6.5 0.08 90
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 634/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
634
Shoe Size Past Size Distribution Recent Size Frequency
7.0 0.20 222
7.5 0.20 258
8.0 0.15 177
8.5 0.11 121
9.0 0.08 91
9.5 0.04 53
10.0 0.02 31
7. Achessopeningisasequenceofmovesatthebeginningofachessgame.Therearemanywell-studied
namedopeningsinchessliterature.FrenchDefenseisoneofthemostpopularopeningsforblack,althoughit
isconsideredarelativelyweakopeningsinceitgivesblackprobability0.344ofwinning,probability0.405of
losing,andprobability0.251ofdrawing.Achessmasterbelievesthathehasdiscoveredanewvariationof
FrenchDefensethatmayaltertheprobabilitydistributionoftheoutcomeofthegame.InhismanyInternet
chessgamesinthelasttwoyears,hewasabletoapplythenewvariationin77games.Thewins,losses,and
drawsinthe77gamesaregiveninthetableprovided.Test,atthe5%levelofsignificance,whetherthereis
sufficientevidenceinthedatatoconcludethatthenewlydiscoveredvariationofFrenchDefensealtersthe
probabilitydistributionoftheresultofthegame.
Result for Black Probability Distribution New Variation Wins
Win 0.344 31
Loss 0.405 25
Draw 0.251 21
8. TheDepartmentofParksandWildlifestocksalargelakewithfisheverysixyears.Itisdeterminedthata
healthydiversityoffishinthelakeshouldconsistof10%largemouthbass,15%smallmouthbass,10%striped
bass,10%trout,and20%catfish.Thereforeeachtimethelakeisstocked,thefishpopulationinthelakeis
restoredtomaintainthatparticulardistribution.Everythreeyears,thedepartmentconductsastudytosee
whetherthedistributionofthefishinthelakehasshiftedawayfromthetargetproportions.Inoneparticular
year,aresearchgroupfromthedepartmentobservedasampleof292fishfromthelakewiththeresults
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 635/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
635
giveninthetableprovided.Test,atthe5%levelofsignificance,whetherthereissufficientevidenceinthe
datatoconcludethatthefishpopulationdistributionhasshiftedsincethelaststocking.
Fish TargetDistribution FishinSample
LargemouthBass 0.10 14
SmallmouthBass 0.15 49
StripedBass 0.10 21
Trout 0.10 22
Catfish 0.20 75
Other 0.35 111
L A R G E D A T A S E T E X E R C I S E
9. LargeDataSet4recordstheresultof500tossesofsix-sideddie.Test,atthe10%levelofsignificance,
whetherthereissufficientevidenceinthedatatoconcludethatthedieisnot“fair”(or“balanced”),thatis,
thattheprobabilitydistributiondiffersfromprobability1/6foreachofthesixfacesonthedie.
http://www.flatworldknowledge.com/sites/all/files/data4.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 636/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
636
11.3F -testsforEqualityofTwoVariances
L E A R N I N G O B J E C T I V E S
1. TounderstandwhatF -distributionsare.
2. TounderstandhowtouseanF -testtojudgewhethertwopopulationvariancesareequal.
F -Distributions Another important and useful family of distributions in statistics is the family of F -distributions. Each
member of the F -distribution family is specified by a pair of parameters called degrees of freedom and
denoted df 1and df 2. Figure 11.7 "Many " shows several F -distributions for different pairs of degrees of
freedom. An F random variable is a random variable that assumes only positive values and follows
an F -distribution.
Figure 11.7 Many F -Distributions
The parameter df 1 is often referred to as the numerator degrees of freedom and the parameter df 2 asthe denominator degrees of freedom. It is important to keep in mind that they are not interchangeable.
For example, the F -distribution with degrees of freedom df 1=3 and df 2=8 is a different distribution from
the F -distribution with degrees of freedom df 1=8 and df 2=3.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 637/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
637
DefinitionThe value of the F random variable F with degrees of freedom df1 and df2that cuts off a right tail of
area c is denoted F c and is called a critical value. See Figure 11.8.
Figure 11.8 F c Illustrated
Tables containing the values of F c are given in Chapter 11 "Chi-Square Tests and ". Each of the tables
is for a fixed collection of values of c, either 0.900, 0.950, 0.975, 0.990, and 0.995 (yielding what are
called “lower” critical values), or 0.005, 0.010, 0.025, 0.050, and 0.100 (yielding what are called
“upper” critical values). In each table critical values are given for various pairs (df1,df2). We illustrate
the use of the tables with several examples.
E X A M P L E 3
SupposeF isanF randomvariablewithdegreesoffreedom df1=5anddf2=4.Usethetablestofind
a. F 0.10
b. F 0.95
Solution:
a. Thecolumnheadingsofallthetablescontaindf1=5.Lookforthetableforwhich0.10isoneof
theentriesontheextremeleft(atableofuppercriticalvalues)andthathasarowheadingdf2=4inthe
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 638/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
638
leftmarginofthetable.Aportionoftherelevanttableisprovided.Theentryintheintersectionofthe
columnwithheadingdf1=5andtherowwiththeheadings0.10anddf2=4,whichisshadedinthetable
provided,istheanswer,F0.10=4.05.
F Tail Area
df1
1 2 · · · 5 · · · df2
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0.005 4 · · · · · · · · · 22.5 · · ·
0.01 4 · · · · · · · · · 15.5 · · ·
0.025 4 · · · · · · · · · 9.36 · · ·
0.05 4 · · · · · · · · · 6.26 · · ·
0.10 4 · · · · · · · · · 4.05 · · ·
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
b. Lookforthetableforwhich0.95isoneoftheentriesontheextremeleft(atableoflower
criticalvalues)andthathasarowheadingdf2=4intheleftmarginofthetable.Aportionoftherelevant
tableisprovided.Theentryintheintersectionofthecolumnwithheadingdf1=5andtherowwiththe
headings0.95anddf2=4,whichisshadedinthetableprovided,istheanswer,F0.95=0.19.
F TailArea
df1
1 2 ··· 5 ··· df2
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0.90 4 ··· ··· ··· 0.28 ···
0.95 4 ··· ··· ··· 0.19 ···
0.975 4 ··· ··· ··· 0.14 ···
0.99 4 ··· ··· ··· 0.09 ···
0.995 4 ··· ··· ··· 0.06 ···
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 639/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
639
E X A M P L E 4
SupposeFisanF randomvariablewithdegreesoffreedom df1=2anddf2=20.Letα=0.05.Usethe
tablestofind
a. Fα
b. Fα∕2
c. F1−α
d. F1−α∕2
Solution:
a. Thecolumnheadingsofallthetablescontaindf1=2.Lookforthetableforwhichα=0.05isone
oftheentriesontheextremeleft(atableofuppercriticalvalues)andthathasarowheadingdf2=20in
theleftmarginofthetable.Aportionoftherelevanttableisprovided.Theshadedentry,inthe
intersectionofthecolumnwithheadingdf1=2andtherowwiththeheadings0.05anddf2=20isthe
answer,F0.05=3.49.
F Tail Area
df1
1 2 · · · df2
⋮ ⋮ ⋮ ⋮ ⋮
0.005 20 · · · 6.99 · · ·
0.01 20 · · · 5.85 · · ·
0.025 20 · · · 4.46 · · ·
0.05 20 · · · 3.49 · · ·
0.10 20 · · · 2.59 · · ·
⋮ ⋮ ⋮ ⋮ ⋮
b. Lookforthetableforwhichα∕2=0.025isoneoftheentriesontheextremeleft(atableofupper
criticalvalues)andthathasarowheadingdf2=20intheleftmarginofthetable.Aportionoftherelevant
tableisprovided.Theshadedentry,intheintersectionofthecolumnwithheadingdf1=2andtherow
withtheheadings0.025anddf2=20istheanswer,F0.025=4.46.
F Tail Area df1 1 2 · · ·
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 640/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
640
df2
⋮ ⋮ ⋮ ⋮ ⋮
0.005 20 · · · 6.99 · · ·
0.01 20 · · · 5.85 · · ·
0.025 20 · · · 4.46 · · ·
0.05 20 · · · 3.49 · · ·
0.10 20 · · · 2.59 · · ·
⋮ ⋮ ⋮ ⋮ ⋮
C. Lookforthetableforwhich1−α=0.95isoneoftheentriesontheextremeleft(atable
oflowercriticalvalues)andthathasarowheadingdf2=20intheleftmarginofthe
table.Aportionoftherelevanttableisprovided.Theshadedentry,intheintersection
ofthecolumnwithheadingdf1=2andtherowwiththeheadings0.95anddf2=20isthe
answer,F0.95=0.05.
F Tail Area
df1
1 2 · · · df2
⋮ ⋮ ⋮ ⋮ ⋮
0.90 20 · · · 0.11 · · ·
0.95 20 · · · 0.05 · · ·
0.975 20 · · · 0.03 · · ·
0.99 20 · · · 0.01 · · ·
0.995 20 · · · 0.01 · · ·
⋮ ⋮ ⋮ ⋮ ⋮
d. Lookforthetableforwhich1−α∕2=0.975isoneoftheentriesontheextremeleft(atableof
lowercriticalvalues)andthathasarowheadingdf2=20intheleftmarginofthetable.Aportionofthe
relevanttableisprovided.Theshadedentry,intheintersectionofthecolumnwithheadingdf1=2and
therowwiththeheadings0.975anddf2=20istheanswer,F0.975=0.03.
F Tail Area
df1
1 2 · · · df2
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 641/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
641
F Tail Area
df1
1 2 · · · df2
⋮ ⋮ ⋮ ⋮ ⋮
0.90 20 · · · 0.11 · · ·
0.95 20 · · · 0.05 · · ·
0.975 20 · · · 0.03 · · ·
0.99 20 · · · 0.01 · · ·
0.995 20 · · · 0.01 · · ·
⋮ ⋮ ⋮ ⋮ ⋮
A fact that sometimes allows us to find a critical value from a table that we could not read otherwise
is:
If Fu(r,s) denotes the value of the F -distribution with degrees of freedom df1=r and df2=s that cuts off a
right tail of area u, then
Fc(k,ℓ)=1F1−c(ℓ,k)
E X A M P L E 5
Usethetablestofind
a. F 0.01foranF randomvariablewithdf1=13anddf2=8
b. F 0.975foranF randomvariablewithdf1=40anddf2=10
Solution:
a. Thereisnotablewithdf1=13,butthereisonewithdf1=8.Thusweusethefactthat
F0.01(13,8)=1F0.99(8,13)
UsingtherelevanttablewefindthatF0.99(8,13)=0.18,henceF0.01(13,8)=0.18−1=5.556.
b. Thereisnotablewithdf1=40,butthereisonewithdf1=10.Thusweusethefactthat
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 642/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
642
F0.975(40,10)=1F0.025(10,40)
UsingtherelevanttablewefindthatF0.025(10,40)=3.31,henceF0.975(40,10)=3.31−1=0.302.
F -TestsforEqualityofTwoVariancesIn Chapter 9 "Two-Sample Problems" we saw how to test hypotheses about the difference between
two population means 1 and 2. In some practical situations the difference between the
population standard deviations 1 and 2 is also of interest. Standard deviation measures the
variability of a random variable. For example, if the random variable measures the size of a
machined part in a manufacturing process, the size of standard deviation is one indicator of product
quality. A smaller standard deviation among items produced in the manufacturing process is
desirable since it indicates consistency in product quality.
For theoretical reasons it is easier to compare the squares of the population standard deviations, the
population variances 12 and 22. This is not a problem, since 1= 2 precisely
when 12= 22, 1< 2 precisely when 12< 22, and 1> 2 precisely when 12> 22.
The null hypothesis always has the form H0: 12= 22. The three forms of the alternative
hypothesis, with the terminology for each case, are:
FormofH a Terminology
Ha:σ12>σ22 Right-tailed
Ha:σ12<σ22 Left-tailed
Ha:σ12≠σ22 Two-tailed
Just as when we test hypotheses concerning two population means, we take a random sample from
each population, of sizes n1 and n2, and compute the sample standard deviations s1 and s2. In this
context the samples are always independent. The populations themselves must be normally
distributed.
TestStatisticforHypothesisTestsConcerningtheDifferenceBetweenTwo
PopulationVariances
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 643/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
643
F=s12s22
If the two populations are normally distributed and if H0: 12= 22 is true then under independent
sampling F approximately follows an F -distribution with degrees of freedom df1=n1 1 and df2=n2 1.
A test based on the test statistic F is called an F -test.
A most important point is that while the rejection region for a right-tailed test is exactly as in every
other situation that we have encountered, because of the asymmetry in the F -distribution the critical
value for a left-tailed test and the lower critical value for a two-tailed test have the special forms
shown in the following table:
Terminology AlternativeHypothesis RejectionRegion
Right-tailed Ha:σ12>σ22 F≥Fα
Left-tailed Ha:σ12<σ22 F≤F1−α
Two-tailed Ha:σ12≠σ22 F≤F1−α∕2orF≥Fα∕2
Figure 11.9 "Rejection Regions: (a) Right-Tailed; (b) Left-Tailed; (c) Two-Tailed" illustrates these
rejection regions.
Figure 11.9 Rejection Regions: (a) Right-Tailed; (b) Left-Tailed; (c) Two-Tailed
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 644/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
644
The test is performed using the usual five-step procedure described at the end of Section 8.1 "The
Elements of Hypothesis Testing" in Chapter 8 "Testing Hypotheses".
E X A M P L E 6
Oneofthequalitymeasuresofbloodglucosemeterstripsistheconsistencyofthetestresultsonthe
samesampleofblood.Theconsistencyismeasuredbythevarianceofthereadingsinrepeated
testing.Supposetwotypesofstrips, AandB,arecomparedfortheirrespectiveconsistencies.We
arbitrarilylabelthepopulationofType AstripsPopulation1andthepopulationofTypeBstrips
Population2.Suppose15Type Astripsweretestedwithblooddropsfromawell-shakenvialand20
TypeBstripsweretestedwiththebloodfromthesamevial.Theresultsaresummarizedin Table
11.16"TwoTypesofTestStrips".AssumetheglucosereadingsusingType Astripsfollowanormal
distributionwithvarianceσ 21andthoseusingTypeBstripsfollowanormaldistributionwithvariance
withσ 22.Test,atthe10%levelofsignificance,whetherthedataprovidesufficientevidenceto
concludethattheconsistenciesofthetwotypesofstripsaredifferent.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 645/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
645
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 646/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
646
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 647/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
647
K E Y T A K E A W A Y S
• CriticalvaluesofanF -distributionwithdegreesoffreedomdf 1anddf 2arefoundintablesinChapter12
"Appendix".
• AnF -testcanbeusedtoevaluatethehypothesisoftwoidenticalnormalpopulationvariances.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 648/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
648
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 649/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
649
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 650/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
650
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 651/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
651
A P P L I C A T I O N S
15. JapanesesturgeonisasubspeciesofthesturgeonfamilyindigenoustoJapanandtheNorthwestPacific.Ina
particularfishhatcherynewlyhatchedbabyJapanesesturgeonarekeptintanksforseveralweeksbefore
beingtransferredtolargerponds.Dissolvedoxygenintankwaterisverytightlymonitoredbyanelectronic
systemandrigorouslymaintainedatatargetlevelof6.5milligramsperliter(mg/l).Thefishhatcherylooksto
upgradetheirwatermonitoringsystemsfortightercontrolofdissolvedoxygen.Anewsystemisevaluated
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 652/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
652
againsttheoldonecurrentlybeingusedintermsofthevarianceinmeasureddissolvedoxygen.Thirty-one
watersamplesfromatankoperatedwiththenewsystemwerecollectedand16watersamplesfromatank
operatedwiththeoldsystemwerecollected,allduringthecourseofaday.Thesamplesyieldthefollowing
information:
New Sample 1 :n1=31 s21=0.0121
Old Sample 2: n2=16 s22=0.0319
Test,atthe10%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethatthenew
systemwillprovideatightercontrolofdissolvedoxygeninthetanks.
16. Theriskofinvestinginastockismeasuredbythevolatility,orthevariance,inchangesinthepriceofthat
stock.Mutualfundsarebasketsofstocksandoffergenerallylowerrisktoinvestors.Differentmutualfunds
havedifferentfocusesandofferdifferentlevelsofrisk.Hippolytaisdecidingbetweentwomutual
funds, AandB,withsimilarexpectedreturns.Tomakeafinaldecision,sheexaminedtheannualreturnsof
thetwofundsduringthelasttenyearsandobtainedthefollowinginformation:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 653/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
653
Test,atthe10%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethatthenew
playlisthasexpandedtherangeoflistenerages.
19. Alaptopcomputermakerusesbatterypackssuppliedbytwocompanies, AandB.Whilebothbrandshave
thesameaveragebatterylifebetweencharges(LBC),thecomputermakerseemstoreceivemore
complaintsaboutshorterLBCthanexpectedforbatterypackssuppliedbycompanyB.Thecomputer
makersuspectsthatthiscouldbecausedbyhighervarianceinLBCforBrandB.Tocheckthat,tennew
batterypacksfromeachbrandareselected,installedonthesamemodelsoflaptops,andthelaptopsare
allowedtorununtilthebatterypacksarecompletelydischarged.ThefollowingaretheobservedLBCsin
hours.
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 654/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
654
L A R G E D A T A S E T E X E R C I S E S
21. LargeDataSets1Aand1BrecordSATscoresfor419maleand581femalestudents.Test,atthe1%levelof
significance,whetherthedataprovidesufficientevidencetoconcludethatthevariancesofscoresofmale
andfemalestudentsdiffer.http://www.flatworldknowledge.com/sites/all/files/data1A.xls
http://www.flatworldknowledge.com/sites/all/files/data1B.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 655/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
655
22. LargeDataSets7,7A,and7Brecordthesurvivaltimesof140laboratorymicewiththymicleukemia.Test,at
the10%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethatthevariancesof
survivaltimesofmalemiceandfemalemicediffer.
http://www.flatworldknowledge.com/sites/all/files/data7.xls
http://www.flatworldknowledge.com/sites/all/files/data7A.xls
http://www.flatworldknowledge.com/sites/all/files/data7B.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 656/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
656
11.4F -TestsinOne-WayANOVA
L E A R N I N G O B J E C T I V E
1. TounderstandhowtouseanF -testtojudgewhetherseveralpopulationmeansareallequal.
In Chapter 9 "Two-Sample Problems" we saw how to compare two population means µ1 and µ2. In
this section we will learn to compare three or more population means at the same time, which is
often of interest in practical applications. For example, an administrator at a university may be
interested in knowing whether student grade point averages are the same for different majors. In
another example, an oncologist may be interested in knowing whether patients with the same type of
cancer have the same average survival times under several different competing cancer treatments.
In general, suppose there are K normal populations with possibly different means, µ1, µ2,…, µ K , but all
with the same variance σ 2. The study question is whether all the K population means are the same.
We formulate this question as the test of hypotheses
H 0: µ1= µ2= ⋅ ⋅ ⋅ = µ K
vs. H a: not all K population means are equal
To perform the test K independent random samples are taken from the K normal populations.
The K sample means, the K sample variances, and the K sample sizes are summarized in the table:Population Sample Size Sample Mean Sample Variance
1 n1 x−1 s21
2 n2 x−2 s22
⋮ ⋮ ⋮ ⋮
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 657/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
657
Population Sample Size Sample Mean Sample Variance
K n K x− K s2 K
Define the following quantities:
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 658/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
658
E X A M P L E 8 Theaverageofgradepointaverages(GPAs)ofcollegecoursesinaspecificmajorisameasureof
difficultyofthemajor.Aneducatorwishestoconductastudytofindoutwhetherthedifficultylevels
ofdifferentmajorsarethesame.Forsuchastudy,arandomsampleofmajorgradepointaverages
(GPA)of11graduatingseniorsatalargeuniversityisselectedforeachofthefourmajors
mathematics,English,education,andbiology.Thedataaregivenin Table11.17"DifficultyLevelsof
CollegeMajors".Test,atthe5%levelofsignificance,whetherthedatacontainsufficientevidenceto
concludethattherearedifferencesamongtheaveragemajorGPAsofthesefourmajors.
T A B L E 1 1 . 1 7 D I F F I C U L T Y L E V E L S O F C O L L E G E M A J O R S
Mathematics English Education Biology
2.59 3.64 4.00 2.78
3.13 3.19 3.59 3.51
2.97 3.15 2.80 2.65
2.50 3.78 2.39 3.16
2.53 3.03 3.47 2.94
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 659/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
659
Mathematics English Education Biology
3.29 2.61 3.59 2.32
2.53 3.20 3.74 2.58
3.17 3.30 3.77 3.21
2.70 3.54 3.13 3.23
3.88 3.25 3.00 3.57
2.64 4.00 3.47 3.22
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 660/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
660
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 661/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
661
E X A M P L E 9
Aresearchlaboratorydevelopedtwotreatmentswhicharebelievedtohavethepotentialof
prolongingthesurvivaltimesofpatientswithanacuteformofthymicleukemia.Toevaluatethe
potentialtreatmenteffects33laboratorymicewiththymicleukemiawererandomlydividedinto
threegroups.OnegroupreceivedTreatment1,onereceivedTreatment2,andthethirdwas
observedasacontrolgroup.Thesurvivaltimesofthesemicearegivenin Table11.18"MiceSurvival
TimesinDays".Test,atthe1%levelofsignificance,whetherthesedataprovidesufficientevidenceto
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 662/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
662
confirmthebeliefthatatleastoneofthetwotreatmentsaffectstheaveragesurvivaltimeofmice
withthymicleukemia.
T A B L E 1 1 . 1 8 M I C E S U R V I V A L T I M E S I N D A Y S
Treatment1 Treatment2 Control
71 75 77 81
72 73 67 79
75 72 79 73
80 65 78 71
60 63 81 75
65 69 72 84
63 64 71 77
78 71 84 67
91
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 663/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
663
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 664/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
664
K E Y T A K E A W A Y
• AnF -testcanbeusedtoevaluatethehypothesisthatthemeansofseveralnormalpopulations,allwith
thesamestandarddeviation,areidentical.
E X E R C I S E S
B A S I C
1. Thefollowingthreerandomsamplesaretakenfromthreenormalpopulationswithrespectivemeans µ1, µ2,
and µ3,andthesamevarianceσ 2.
Sample 1 Sample 2 Sample 3
2 3 0
2 5 1
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 665/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
665
Sample 1 Sample 2 Sample 3
3 7 2
5 1
3
a. Findthecombinedsamplesizen.
b. Findthecombinedsamplemean x−.
c. Findthesamplemeanforeachofthethreesamples.
d. Findthesamplevarianceforeachofthethreesamples.
e. Find MST .
f. Find MSE .
g. FindF = MST / MSE .
2. Thefollowingthreerandomsamplesaretakenfromthreenormalpopulationswithrespective
means µ1, µ2,and µ3,andasamevarianceσ 2.
Sample 1 Sample 2 Sample 3
0.0 1.3 0.2
0.1 1.5 0.2
0.2 1.7 0.3
0.1 0.5
0.0
a. Findthecombinedsamplesizen.
b. Findthecombinedsamplemean x−.
c. Findthesamplemeanforeachofthethreesamples.
d. Findthesamplevarianceforeachofthethreesamples.
e. Find MST .
f. Find MSE .
g. FindF = MST / MSE .
3. RefertoExercise1.
a. FindthenumberofpopulationsunderconsiderationK .
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 666/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
666
b. Findthedegreesoffreedomdf 1= K −1anddf 2=n− K .
c. Forα=0.05,findF αwiththedegreesoffreedomcomputedabove.
d. Atα=0.05,testhypotheses
A P P L I C A T I O N S
5. TheMozarteffectreferstoaboostofaverageperformanceontestsforelementaryschoolstudentsifthe
studentslistentoMozart’schambermusicforaperiodoftimeimmediatelybeforethetest.Inorderto
attempttotestwhethertheMozarteffectactuallyexists,anelementaryschoolteacherconductedan
experimentbydividingherthird-gradeclassof15studentsintothreegroupsof5.Thefirstgroupwasgivenanend-of-gradetestwithoutmusic;thesecondgrouplistenedtoMozart’schambermusicfor10minutes;
andthethirdgroupslistenedtoMozart’schambermusicfor20minutesbeforethetest.Thescoresofthe15
studentsaregivenbelow:
Group 1 Group 2 Group 3
80 79 73
63 73 82
74 74 79
71 77 82
70 81 84
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 667/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
667
UsingtheANOVAF-testatα=0.10,istheresufficientevidenceinthedatatosuggestthattheMozarteffect
exists?
6. TheMozarteffectreferstoaboostofaverageperformanceontestsforelementaryschoolstudentsifthe
studentslistentoMozart’schambermusicforaperiodoftimeimmediatelybeforethetest.Manyeducators
believethatsuchaneffectisnotnecessarilyduetoMozart’smusicpersebutratherarelaxationperiod
beforethetest.Tosupportthisbelief,anelementaryschoolteacherconductedanexperimentbydividing
herthird-gradeclassof15studentsintothreegroupsof5.Studentsinthefirstgroupwereaskedtogive
themselvesaself-administeredfacialmassage;studentsinthesecondgrouplistenedtoMozart’schamber
musicfor15minutes;studentsinthethirdgrouplistenedtoSchubert’schambermusicfor15minutesbefore
thetest.Thescoresofthe15studentsaregivenbelow:
Group 1 Group 2 Group 3
79 82 80
81 84 81
80 86 71
89 91 90
86 82 86
Test,usingtheANOVAF -testatthe10%levelofsignificance,whetherthedataprovidesufficientevidenceto
concludethatanyofthethreerelaxationmethoddoesbetterthantheothers.7. Precisionweighingdevicesaresensitivetoenvironmentalconditions.Temperatureandhumidityina
laboratoryroomwheresuchadeviceisinstalledaretightlycontrolledtoensurehighprecisioninweighing.A
newlydesignedweighingdeviceisclaimedtobemorerobustagainstsmallvariationsoftemperatureand
humidity.Toverifysuchaclaim,alaboratoryteststhenewdeviceunderfoursettingsoftemperature-
humidityconditions.First,twolevelsofhighandlow temperatureandtwolevelsof highandlow humidity
areidentified.LetT standfortemperatureandHforhumidity.Thefourexperimentalsettingsaredefined
andnotedas(T ,H):(high,high),(high,low),(low,high),and(low,low).Apre-calibratedstandardweightof1
kgwasweighedbythenewdevicefourtimesineachsetting.Theresultsintermsoferror(inmicrograms
mcg)aregivenbelow:
(high, high) (high, low) (low, high) (low, low)
−1.50 11.47 −14.29 5.54
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 668/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
668
(high, high) (high, low) (low, high) (low, low)
−6.73 9.28 −18.11 10.34
11.69 5.58 −11.16 15.23
−5.72 10.80 −10.41 −5.69
Test,usingtheANOVAF -testatthe1%levelofsignificance,whetherthedataprovidesufficientevidenceto
concludethatthemeanweightreadingsbythenewlydesigneddevicevaryamongthefoursettings.
8. Toinvestigatetherealcostofowningdifferentmakesandmodelsofnewautomobiles,aconsumer
protectionagencyfollowed16ownersofnewvehiclesoffourpopularmakesandmodels,call
themTC , HA, NA,and FT ,andkeptarecordofeachoftheowner’srealcostindollarsforthefirstfive
years.Thefive-yearcostsofthe16carownersaregivenbelow:
TC HA NA FT
8423 7776 8907 10333
7889 7211 9077 9217
8665 6870 8732 10540
7129 9747
7359 8677
Test,usingtheANOVAF -testatthe5%levelofsignificance,whetherthedataprovidesufficientevidenceto
concludethattherearedifferencesamongthemeanrealcostsofownershipforthesefourmodels.
9. HelpingpeopletoloseweighthasbecomeahugeindustryintheUnitedStates,withannualrevenuein
thehundredsofbilliondollars.Recentlyeachofthethreemarket-leadingweightreducingprograms
claimedtobethemosteffective.Aconsumerresearchcompanyrecruited33peoplewhowishedtolose
weightandsentthemtothethreeleadingprograms.Aftersixmonthstheirweightlosseswererecorded.
Theresultsaresummarizedbelow:
Statistic Prog. 1 Prog. 2 Prog. 3
Sample Mean x−1=10.65 x−2=8.90 x−3=9.33
Sample Variance s21=27.20 s22=16.86 s23=32.40
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 669/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
669
Statistic Prog. 1 Prog. 2 Prog. 3
Sample Size n1=11 n2=11 n3=11
Themeanweightlossofthecombinedsampleofall33peoplewas x−=9.63.Test,usingtheANOVAF -testat
the5%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethatsomeprogramis
moreeffectivethantheothers.
10. Aleadingpharmaceuticalcompanyinthedisposablecontactlensesmarkethasalwaystakenforgrantedthat
thesalesofcertainperipheralproductssuchascontactlenssolutionswouldautomaticallygowiththe
establishedbrands.Thelong-standingcultureinthecompanyhasbeenthatlenssolutionswouldnotmakea
significantdifferenceinuserexperience.Recentmarketresearchsurveys,however,suggestotherwise.To
gainabetterunderstandingoftheeffectsofcontactlenssolutionsonuserexperience,thecompany
conductedacomparativestudyinwhich63contactlensuserswererandomlydividedintothreegroups,each
ofwhichreceivedoneofthreetopsellinglenssolutionsonthemarket,includingoneofthecompany’sown.Afterusingtheassignedsolutionfortwoweeks,eachparticipantwasaskedtoratethesolutiononthescale
of1to5forsatisfaction,with5beingthehighestlevelofsatisfaction.Theresultsofthestudyare
summarizedbelow:
Statistics Sol. 1 Sol. 2 Sol. 3
Sample Mean x−1=3.28 x−2=3.96 x−3=4.10
Sample Variance s21=0.15 s22=0.32 s23=0.36
Sample Size n1=18 n2=23 n3=22
Themeansatisfactionlevelofthecombinedsampleofall63participantswas x−=3.81.Test,usingthe
ANOVAF -testatthe5%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethat
notallthreeaveragesatisfactionlevelsarethesame.
L A R G E D A T A S E T E X E R C I S E
11. LargeDataSet9recordsthecostsofmaterials(textbook,solutionmanual,laboratoryfees,andsoon)ineach
oftendifferentcoursesineachofthreedifferentsubjects,chemistry,computerscience,andmathematics.
Test,atthe1%levelofsignificance,whetherthedataprovidesufficientevidencetoconcludethatthemean
costsinthethreedisciplinesarenotallthesame.
http://www.flatworldknowledge.com/sites/all/files/data9.xls
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 670/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
670
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 671/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
671
Chapter12
Appendix
Figure 12.1 Cumulative Binomial Probability
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 672/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
672
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 673/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
673
Figure 12.2 Cumulative Normal Probability
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 674/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
674
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 675/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
675
Figure 12.3 Critical Values of t
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 676/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
676
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 677/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
677
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 678/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
678
Figure 12.4 Critical Values of Chi-Square Distributions
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 679/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
679
Figure 12.5 Upper Critical Values of F-Distributions
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 680/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
680
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 681/682
ttributedtoDouglasS.ShaferandZhiyiZhang Saylor.org
aylorURL:http://www.saylor.org/books/
681
7/18/2019 Introductory Statistics, Shafer Zhang-Attributed
http://slidepdf.com/reader/full/introductory-statistics-shafer-zhang-attributed 682/682
Figure 12.6 Lower Critical Values of F-Distributions