+ All Categories
Home > Documents > Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf ·...

Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf ·...

Date post: 24-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
19
CHAPTER 7 In this chapter we cover... Part I Summary Review Exercises Supplementary Exercises EESEE Case Studies Gallo Images–Anthony Bannister/Getty Images Exploring Data: Part I Review Data analysis is the art of describing data using graphs and numerical summaries. The purpose of data analysis is to help us see and understand the most important features of a set of data. Chapter 1 commented on graphs to display distributions: pie charts and bar graphs for categorical variables, histograms and stemplots for quantitative variables. In addition, time plots show how a quantitative variable changes over time. Chapter 2 presented numerical tools for describing the center and spread of the distribution of one variable. Chapter 3 discussed density curves for describing the overall pattern of a distribution, with emphasis on the Normal distributions. The first STATISTICS IN SUMMARY figure on the next page organizes the big ideas for exploring a quantitative variable. Plot your data, then describe their center and spread using either the mean and standard deviation or the five-number summary. The last step, which makes sense only for some data, is to summarize the data in compact form by using a Normal curve as a description of the overall pattern. The question marks at the last two stages remind us that the usefulness of numerical summaries and Normal distributions depends on what we find when we examine graphs of our data. No short summary does justice to irregular shapes or to data with several distinct clusters. 167
Transcript
Page 1: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

CH

AP

TE

R

7In this chapter we cover...

Part I Summary

Review Exercises

Supplementary Exercises

EESEE Case Studies

Gal

loIm

ages

–Ant

hony

Ban

nist

er/G

etty

Imag

es

Exploring Data:Part I Review

Data analysis is the art of describing data using graphs and numerical summaries.The purpose of data analysis is to help us see and understand the most importantfeatures of a set of data. Chapter 1 commented on graphs to display distributions:pie charts and bar graphs for categorical variables, histograms and stemplots forquantitative variables. In addition, time plots show how a quantitative variablechanges over time. Chapter 2 presented numerical tools for describing the centerand spread of the distribution of one variable. Chapter 3 discussed density curvesfor describing the overall pattern of a distribution, with emphasis on the Normaldistributions.

The first STATISTICS IN SUMMARY figure on the next page organizes thebig ideas for exploring a quantitative variable. Plot your data, then describe theircenter and spread using either the mean and standard deviation or the five-numbersummary. The last step, which makes sense only for some data, is to summarizethe data in compact form by using a Normal curve as a description of the overallpattern. The question marks at the last two stages remind us that the usefulness ofnumerical summaries and Normal distributions depends on what we find when weexamine graphs of our data. No short summary does justice to irregular shapes orto data with several distinct clusters.

167

Page 2: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

168 C H A P T E R 7 • Exploring Data: Part I Review

STATISTICS IN SUMMARY

Plot your data:

Stemplot, histogram

Density curve?

Normal distribution?

Numerical summary?

x and s, five-number summary?

Interpret what you see:

Shape, center, spread, outliers

Analyzing Data for One Variable

Chapters 4 and 5 applied the same ideas to relationships between two quanti-tative variables. The second STATISTICS IN SUMMARY figure retraces the bigideas, with details that fit the new setting. Always begin by making graphs of yourdata. In the case of a scatterplot, we have learned a numerical summary only fordata that show a roughly linear pattern on the scatterplot. The summary is thenthe means and standard deviations of the two variables and their correlation. Aregression line drawn on the plot gives a compact description of the overall pat-tern that we can use for prediction. Once again there are question marks at thelast two stages to remind us that correlation and regression describe only straight-line relationships. Chapter 6 shows how to understand relationships between twocategorical variables; comparing well-chosen percents is the key.

You can organize your work in any open-ended data analysis setting by follow-ing the four-step State, Formulate, Solve, and Conclude process first introducedin Chapter 2. After we have mastered the extra background needed for statisticalinference, this process will also guide practical work on inference later in the book.

Plot your data:Scatterplot

Interpret what you see:Direction, form, strength.Linear?

Numerical summary?x, y, sx, sy, and r?

Regression line?

STATISTICS IN SUMMARY

Analyzing Data for Two Variables

Page 3: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

Part I Summary 169

P A R T I SUMMARYHere are the most important skills you should have acquired from readingChapters 1 to 6.

A. DATA1. Identify the individuals and variables in a set of data.2. Identify each variable as categorical or quantitative. Identify the units in

which each quantitative variable is measured.3. Identify the explanatory and response variables in situations where one

variable explains or influences another.

B. DISPLAYING DISTRIBUTIONS1. Recognize when a pie chart can and cannot be used.2. Make a bar graph of the distribution of a categorical variable, or in general

to compare related quantities.3. Interpret pie charts and bar graphs.4. Make a time plot of a quantitative variable over time. Recognize patterns

such as trends and cycles in time plots.5. Make a histogram of the distribution of a quantitative variable.6. Make a stemplot of the distribution of a small set of observations. Round

leaves or split stems as needed to make an effective stemplot.

C. DESCRIBING DISTRIBUTIONS (QUANTITATIVE VARIABLE)1. Look for the overall pattern and for major deviations from the pattern.2. Assess from a histogram or stemplot whether the shape of a distribution is

roughly symmetric, distinctly skewed, or neither. Assess whether thedistribution has one or more major peaks.

3. Describe the overall pattern by giving numerical measures of center andspread in addition to a verbal description of shape.

4. Decide which measures of center and spread are more appropriate: themean and standard deviation (especially for symmetric distributions) orthe five-number summary (especially for skewed distributions).

5. Recognize outliers and give plausible explanations for them.

D. NUMERICAL SUMMARIES OF DISTRIBUTIONS1. Find the median M and the quartiles Q1 and Q3 for a set of observations.2. Find the five-number summary and draw a boxplot; assess center, spread,

symmetry, and skewness from a boxplot.3. Find the mean x and the standard deviation s for a set of observations.4. Understand that the median is more resistant than the mean. Recognize

that skewness in a distribution moves the mean away from the mediantoward the long tail.

Page 4: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

170 C H A P T E R 7 • Exploring Data: Part I Review

5. Know the basic properties of the standard deviation: s ≥ 0 always; s = 0only when all observations are identical and increases as the spreadincreases; s has the same units as the original measurements; s is pulledstrongly up by outliers or skewness.

E. DENSITY CURVES AND NORMAL DISTRIBUTIONS1. Know that areas under a density curve represent proportions of all

observations and that the total area under a density curve is 1.2. Approximately locate the median (equal-areas point) and the mean

(balance point) on a density curve.3. Know that the mean and median both lie at the center of a symmetric

density curve and that the mean moves farther toward the long tail of askewed curve.

4. Recognize the shape of Normal curves and estimate by eye both the meanand standard deviation from such a curve.

5. Use the 68–95–99.7 rule and symmetry to state what percent of theobservations from a Normal distribution fall between two points whenboth points lie at the mean or one, two, or three standard deviations oneither side of the mean.

6. Find the standardized value (z-score) of an observation. Interpret z-scoresand understand that any Normal distribution becomes standard NormalN(0, 1) when standardized.

7. Given that a variable has a Normal distribution with a stated mean μ andstandard deviation σ , calculate the proportion of values above a statednumber, below a stated number, or between two stated numbers.

8. Given that a variable has a Normal distribution with a stated mean μ andstandard deviation σ , calculate the point having a stated proportion of allvalues above it or below it.

F. SCATTERPLOTS AND CORRELATION1. Make a scatterplot to display the relationship between two quantitative

variables measured on the same subjects. Place the explanatory variable (ifany) on the horizontal scale of the plot.

2. Add a categorical variable to a scatterplot by using a different plottingsymbol or color.

3. Describe the direction, form, and strength of the overall pattern of ascatterplot. In particular, recognize positive or negative association andlinear (straight-line) patterns. Recognize outliers in a scatterplot.

4. Judge whether it is appropriate to use correlation to describe therelationship between two quantitative variables. Find the correlation r.

5. Know the basic properties of correlation: r measures the direction andstrength of only straight-line relationships; r is always a number between−1 and 1; r = ±1 only for perfect straight-line relationships; r movesaway from 0 toward ±1 as the straight-line relationship gets stronger.

Page 5: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

Part I Summary 171

G. REGRESSION LINES1. Understand that regression requires an explanatory variable and a response

variable. Use a calculator or software to find the least-squares regressionline of a response variable y on an explanatory variable x from data.

2. Explain what the slope b and the intercept a mean in the equationy = a + bx of a regression line.

3. Draw a graph of a regression line when you are given its equation.4. Use a regression line to predict y for a given x . Recognize extrapolation

and be aware of its dangers.5. Find the slope and intercept of the least-squares regression line from the

means and standard deviations of x and y and their correlation.6. Use r 2, the square of the correlation, to describe how much of the

variation in one variable can be accounted for by a straight-linerelationship with another variable.

7. Recognize outliers and potentially influential observations from ascatterplot with the regression line drawn on it.

8. Calculate the residuals and plot them against the explanatory variable x .Recognize that a residual plot magnifies the pattern of the scatterplot ofy versus x .

Driving in CanadaCanada is a civilized and restrainednation, at least in the eyes ofAmericans. A survey sponsored bythe Canada Safety Council suggeststhat driving in Canada may be moreadventurous than expected. Of theCanadian drivers surveyed, 88%admitted to aggressive driving inthe past year, and 76% said thatsleep-deprived drivers werecommon on Canadian roads. Whatreally alarms us is the name of thesurvey: the Nerves of SteelAggressive Driving Study.

H. CAUTIONS ABOUT CORRELATION AND REGRESSION1. Understand that both r and the least-squares regression line can be

strongly influenced by a few extreme observations.2. Recognize possible lurking variables that may explain the observed

association between two variables x and y.3. Understand that even a strong correlation does not mean that there is a

cause-and-effect relationship between x and y.4. Give plausible explanations for an observed association between two

variables: direct cause and effect, the influence of lurking variables, orboth.

I. CATEGORICAL DATA (Optional)1. From a two-way table of counts, find the marginal distributions of both

variables by obtaining the row sums and column sums.2. Express any distribution in percents by dividing the category counts by

their total.3. Describe the relationship between two categorical variables by computing

and comparing percents. Often this involves comparing the conditionaldistributions of one variable for the different categories of the othervariable.

4. Recognize Simpson’s paradox and be able to explain it.

Page 6: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

172 C H A P T E R 7 • Exploring Data: Part I Review

R E V I E W EXERCISES

Review exercises help you solidify the basic ideas and skills in Chapters 1 to 6.

7.1 Describing colleges. Popular magazines rank colleges and universities on their“academic quality” in serving undergraduate students. Give one categoricalvariable and two quantitative variables that you would like to see measured foreach college if you were choosing where to study.

7.2 Affording college. From time to time, the Department of Education estimatesthe “average unmet need” for undergraduate students—the cost of school minusestimated family contributions and financial aid. Here are the averages forfull-time students at four types of institution in the most recent study, for the1999–2000 academic year:1

Public 2-year Public 4-year Private nonprofit 4-year Private for-profit

$2747 $2369 $4931 $6548

Make a bar graph of these data. Write a one-sentence conclusion about the unmetneeds of students. Explain clearly why it is incorrect to make a pie chart.

7.3 Changes in how we watch. Movies earn income from many sources other thantheater showings. Here are data on the income of movie studios from two sourcesover time, in billions of dollars (the amounts have been adjusted to the samebuying power that a dollar had in 2004):2

1948 1980 1985 1990 1995 2000 2004

Theater showings 7.8 4.5 3.04 5.28 5.72 6.02 7.40Video/DVD sales 0 0.2 2.40 6.02 10.90 11.97 20.90

Make two time plots on the same scales to compare the two sources of income.(Use one dashed and one solid line to keep them separate.) What pattern doesyour plot show?

7.4 What we watch now. The previous exercise looked at movie studio incomefrom theaters and video/DVD sales over time. Here are data on studio income in2004, in billions of dollars:

Source Income

Theaters 7.4Video/DVD 20.9Pay TV 4.0Free TV 12.6

Make a graph that compares these amounts. What percent of studio incomecomes from theater showings of movies?

7.5 Growing icicles. Table 4.2 (page 98) gives data on the growth of icicles overtime. Let’s look again at Run 8903, for which a slower flow of water producesfaster growth.

(a) How can you tell from a calculation, without drawing a scatterplot, that thepattern of growth is very close to a straight line?

Page 7: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

Review Exercises 173

(b) What is the equation of the least-squares regression line for predicting anicicle’s length from time in minutes under these conditions?

(c) Predict the length of an icicle after one full day. This prediction can’t betrusted. Why not?

7.6 Weights aren’t Normal. The heights of people of the same sex and similar agesfollow a Normal distribution reasonably closely. Weights, on the other hand, arenot Normally distributed. The weights of women aged 20 to 29 have mean141.7 pounds and median 133.2 pounds. The first and third quartiles are118.3 pounds and 157.3 pounds. What can you say about the shape of the weightdistribution? Why?

7.7 Returns on stocks aren’t Normal. The 99.7 part of the 68–95–99.7 rule saysthat in practice Normal distributions are about 6 standard deviations wide.Exercise 2.39 (page 62) gives the real returns for the S&P 500 stock index over a33-year period. The shape of the distribution is not close to Normal. Find themean and standard deviation of the real returns. What are the values 3 standarddeviations above and below the mean, which would span the distribution if itwere Normal? How do these values compare with the actual lowest and highestreturns? Remember that the 68–95–99.7 rule applies only to Normal distributions.

7.8 Remember what you ate. How well do people remember their past diet? Dataare available for 91 people who were asked about their diet when they were 18years old. Researchers asked them at about age 55 to describe their eating habitsat age 18. For each subject, the researchers calculated the correlation betweenactual intakes of many foods at age 18 and the intakes the subjects nowremember. The median of the 91 correlations was r = 0.217. The authors say,“We conclude that memory of food intake in the distant past is fair to poor.”3

Explain why r = 0.217 points to this conclusion.

Alastair Shay; Papilio/CORBIS

7.9 Cicadas as fertilizer? Every 17 years, swarms of cicadas emerge from the ground 4STEPSTEP

in the eastern United States, live for about six weeks, then die. (There are several“broods,” so we experience cicada eruptions more often than every 17 years.)There are so many cicadas that their dead bodies can serve as fertilizer andincrease plant growth. In an experiment, a researcher added 10 cicadas undersome plants in a natural plot of American bellflowers in a forest, leaving otherplants undisturbed. One of the response variables was the size of seeds producedby the plants. Here are data (seed mass in milligrams) for 39 cicada plants and33 undisturbed (control) plants:4

Cicada plants Control plants

0.237 0.277 0.241 0.142 0.212 0.188 0.263 0.2530.109 0.209 0.238 0.277 0.261 0.265 0.135 0.1700.261 0.227 0.171 0.235 0.203 0.241 0.257 0.1550.276 0.234 0.255 0.296 0.215 0.285 0.198 0.2660.239 0.266 0.296 0.217 0.178 0.244 0.190 0.2120.238 0.210 0.295 0.193 0.290 0.253 0.249 0.2530.218 0.263 0.305 0.257 0.268 0.190 0.196 0.2200.351 0.245 0.226 0.276 0.246 0.145 0.247 0.1400.317 0.310 0.223 0.229 0.2410.192 0.201 0.211

Page 8: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

174 C H A P T E R 7 • Exploring Data: Part I Review

Do the data support the idea that dead cicadas can serve as fertilizer? Follow thefour-step process (page 53) in your work.

7.10 Hot mutual funds? Investment advertisements always warn that “pastperformance does not guarantee future results.” Here is an example that showswhy you should pay attention to this warning. The table below gives the percentreturns from 23 Fidelity Investments “sector funds” in 2002 (a down year forstocks) and 2003 (an up year). Sector funds invest in narrow segments of thestock market. They often rise and fall faster than the market as a whole.

2002 2003 2002 2003 2002 2003return return return return return return

−17.1 23.9 −0.7 36.9 −37.8 59.4−6.7 14.1 −5.6 27.5 −11.5 22.9

−21.1 41.8 −26.9 26.1 −0.7 36.9−12.8 43.9 −42.0 62.7 64.3 32.1−18.9 31.1 −47.8 68.1 −9.6 28.7−7.7 32.3 −50.5 71.9 −11.7 29.5

−17.2 36.5 −49.5 57.0 −2.3 19.1−11.4 30.6 −23.4 35.0

(a) Make a scatterplot of 2003 return (response) against 2002 return(explanatory). The funds with the best performance in 2002 tend to have theworst performance in 2003. Fidelity Gold Fund, the only fund with a positivereturn in both years, is an extreme outlier.

(b) To demonstrate that correlation is not resistant, find r for all 23 funds andthen find r for the 22 funds other than Gold. Explain from Gold’s position inyour plot why omitting this point makes r more negative.

7.11 More about cicadas. Let’s examine the distribution of seed mass for plants inthe cicada group of Exercise 7.9 in more detail.

(a) Make a stemplot. Is the overall shape roughly symmetric or clearly skewed?There are both low and high observations that we might call outliers.

(b) Find the mean and standard deviation of the seed masses. Then remove boththe smallest and largest masses and find the mean and standard deviation ofthe remaining 37 seeds. Why does removing these two observations reduce s ?Why does it have little effect on x?

7.12 More on hot funds. Continue your study of the returns for Fidelity sector fundsfrom Exercise 7.10. The least-squares line, like the correlation, is not resistant.

(a) Find the equations of two least-squares lines for predicting 2003 return from2002 return, one for all 23 funds and one omitting Fidelity Gold Fund. Makea scatterplot with both lines drawn on it. The two lines are very different.

(b) Starting with the least-squares idea, explain why adding Fidelity Gold Fundto the other 22 funds moves the line in the direction that your graph shows.

7.13 Outliers? In Exercise 7.11, you noticed that the smallest and largestobservations might be called outliers. Are either of these observations suspectedoutliers by the 1.5 × I QR rule (page 47)?

Page 9: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

Review Exercises 175

7.14 Where does the water go? Here are data on the amounts of water withdrawnfrom natural sources, including rivers, lakes, and wells, in 2000. The units aremillions of gallons per day.5

Use Water withdrawn

Public water supplies 43,300Domestic water supplies 3,590Irrigation 137,000Industry 19,780Power plant cooling 195,500Fish farming 3,700

Make a bar graph to present these data. For clarity, order the bars by amount ofwater used. The total water withdrawn is about 408,000 million gallons per day.About how much is withdrawn for uses not mentioned above?

AP Photo/Mark Lennihan

7.15 Best-selling soft drinks. Here are data on the market share of the best-sellingbrands of carbonated soft drinks in 2003:6

Brand Market share

Coke Classic 18.6%Pepsi-Cola 11.9%Diet Coke 9.4%Mountain Dew 6.3%Sprite 5.9%Diet Pepsi 5.8%Dr. Pepper 5.7%

Display these data in a graph. What percent of the soft drink market is held byother brands?

7.16 Presidential elections. Here are the percents of the popular vote won by thesuccessful candidate in each of the presidential elections from 1948 to 2004.

Year 1948 1952 1956 1960 1964 1968 1972 1976Percent 49.6 55.1 57.4 49.7 61.1 43.4 60.7 50.1

Year 1980 1984 1988 1992 1996 2000 2004Percent 50.7 58.8 53.9 43.2 49.2 47.9 51.2

(a) Make a stemplot of the winners’ percents.

(b) What is the median percent of the vote won by the successful candidate inpresidential elections?

(c) Call an election a landslide if the winner’s percent falls at or above the thirdquartile. Find the third quartile. Which elections were landslides?

Page 10: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 24, 2006 10:44

176 C H A P T E R 7 • Exploring Data: Part I Review

7.17 The Mississippi River. Table 7.1 gives the volume of water discharged by theMississippi River into the Gulf of Mexico for each year from 1954 to 2001.7 Theunits are cubic kilometers of water—the Mississippi is a big river.

(a) Make a graph of the distribution of water volume. Describe the overall shapeof the distribution and any outliers.

(b) Based on the shape of the distribution, do you expect the mean to be close tothe median, clearly less than the median, or clearly greater than the median?Why? Find the mean and the median to check your answer.

(c) Based on the shape of the distribution, does it seem reasonable to use x and sto describe the center and spread of this distribution? Why? Find x and s ifyou think they are a good choice. Otherwise, find the five-number summary.

T A B L E 7 . 1 Yearly discharge (cubic kilometers of water) of the Mississippi River

Year Discharge Year Discharge Year Discharge Year Discharge

1954 290 1966 410 1978 560 1990 6801955 420 1967 460 1979 800 1991 7001956 390 1968 510 1980 500 1992 5101957 610 1969 560 1981 420 1993 9001958 550 1970 540 1982 640 1994 6401959 440 1971 480 1983 770 1995 5901960 470 1972 600 1984 710 1996 6701961 600 1973 880 1985 680 1997 6801962 550 1974 710 1986 600 1998 6901963 360 1975 670 1987 450 1999 5801964 390 1976 420 1988 420 2000 3901965 500 1977 430 1989 630 2001 580

7.18 More on the Mississippi River. The data in Table 7.1 are a time series. Make atime plot that shows how the volume of water in the Mississippi changed between1954 and 2001. What does the time plot reveal that the histogram from theprevious exercise does not? It is a good idea to always make a time plot of timeseries data because a histogram cannot show changes over time.

7.19 A big toe problem. Hallux abducto valgus (call it HAV) is a deformation of thebig toe that is not common in youth and often requires surgery. Doctors usedX-rays to measure the angle (in degrees) of deformity in 38 consecutive patientsunder the age of 21 who came to a medical center for surgery to correct HAV.8

The angle is a measure of the seriousness of the deformity. The data appear inTable 7.2 as “HAV angle.” Make a graph and give a numerical description of thisdistribution. Are there any outliers? Write a brief discussion of the shape, center,and spread of the angle of deformity among young patients needing surgery forthis condition.

7.20 More on a big toe problem. The HAV angle data in the previous exercisecontain one high outlier. Calculate the median, the mean, and the standarddeviation for the full data set and also for the 37 observations remaining when youremove the outlier. How strongly does the outlier affect each of these measures?

Page 11: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

Review Exercises 177

T A B L E 7 . 2 Angle of deformity (degrees) for two types of foot deformity

HAV angle MA angle HAV angle MA angle HAV angle MA angle

28 18 21 15 16 1032 16 17 16 30 1225 22 16 10 30 1034 17 21 7 20 1038 33 23 11 50 1226 10 14 15 25 2525 18 32 12 26 3018 13 25 16 28 2230 19 21 16 31 2426 10 22 18 38 2028 17 20 10 32 3713 14 18 15 21 2320 20 26 16

7.21 Predicting foot problems. Metatarsus adductus (call it MA) is a turning in ofthe front part of the foot that is common in adolescents and usually corrects itself.Table 7.2 gives the severity of MA (“MA angle”) as well. Doctors speculate thatthe severity of MA can help predict the severity of HAV.

(a) Make a scatterplot of the data. (Which is the explanatory variable?)

(b) Describe the form, direction, and strength of the relationship between MAangle and HAV angle. Are there any clear outliers in your graph?

(c) Do you think the data confirm the doctors’ speculation? Why or why not?

7.22 Predicting foot problems, continued.(a) Find the equation of the least-squares regression line for predicting HAV

angle from MA angle. Add this line to the scatterplot you made in theprevious exercise.

(b) A new patient has MA angle 25 degrees. What do you predict this patient’sHAV angle to be?

(c) Does knowing MA angle allow doctors to predict HAV angle accurately?Explain your answer from the scatterplot, then calculate a numerical measureto support your finding.

7.23 Data on mice. For a biology project, you measure the tail length (centimeters)and weight (grams) of 12 mice of the same variety. What units of measurement doeach of the following have?

(a) The mean length of the tails.

(b) The first quartile of the tail lengths.

(c) The standard deviation of the tail lengths.

(d) The correlation between tail length and weight.

Beer in South Dakota

Take a break from doing exercises toapply your math to beer cans inSouth Dakota. A newspaper therereported that every year an averageof 650 beer cans per mile are tossedonto the state’s highways. SouthDakota has about 83,000 miles ofroads. How many beer cans is thatin all? The Census Bureau says thatthere are about 770,000 people inSouth Dakota. How many beer cansdoes each man, woman, and child inthe state toss on the road each year?That’s pretty impressive. Maybe thepaper got its numbers wrong.

7.24 Catalog shopping (optional). What is the most important reason that studentsbuy from catalogs? The answer may differ for different groups of students. Here are

Page 12: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

178 C H A P T E R 7 • Exploring Data: Part I Review

results for samples of American and East Asian students at a large midwesternuniversity:9

American Asian

Save time 29 10Easy 28 11Low price 17 34Live far from stores 11 4No pressure to buy 10 3Other reason 20 7

Total 115 69

(a) Give the marginal distribution of reasons for all students, in percents.

(b) Give the two conditional distributions of reasons, for American and for EastAsian students. What are the most important differences between the twogroups of students?

7.25 How are schools doing? (optional) The nonprofit group Public Agenda4STEPSTEP

conducted telephone interviews with parents of high school children.Interviewers chose equal numbers of black, white, and Hispanic parents atrandom. One question asked was “Are the high schools in your state doing anexcellent, good, fair or poor job, or don’t you know enough to say?” Here are thesurvey results:10

Black Hispanic Whiteparents parents parents

Excellent 12 34 22Good 69 55 81Fair 75 61 60Poor 24 24 24Don’t know 22 28 14

Total 202 202 201

Write a brief analysis of these results that focuses on the relationship betweenparent group and opinions about schools.

7.26 Weighing bean seeds. Biological measurements on the same species oftenfollow a Normal distribution quite closely. The weights of seeds of a variety ofwinged bean are approximately Normal with mean 525 milligrams (mg) andstandard deviation 110 mg.

(a) What percent of seeds weigh more than 500 mg?

(b) If we discard the lightest 10% of these seeds, what is the smallest weightamong the remaining seeds?

Page 13: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

Review Exercises 179

7.27 Breaking bolts. Mechanical measurements on supposedly identical objectsusually vary. The variation often follows a Normal distribution. The stressrequired to break a type of bolt varies Normally with mean 75 kilopounds persquare inch (ksi) and standard deviation 8.3 ksi.

(a) What percent of these bolts will withstand a stress of 90 ksi withoutbreaking?

(b) What range covers the middle 50% of breaking strengths for these bolts?

Soap in the shower. From Rex Boggs in Australia comes an unusual data set: beforeshowering in the morning, he weighed the bar of soap in his shower stall. The weight goesdown as the soap is used. The data appear in Table 7.3 (weights in grams). Notice thatMr. Boggs forgot to weigh the soap on some days. Exercises 7.28 to 7.30 are based onthe soap data set.

T A B L E 7 . 3 Weight (grams) of a bar of soap used to shower

Day Weight Day Weight Day Weight

1 124 8 84 16 272 121 9 78 18 165 103 10 71 19 126 96 12 58 20 87 90 13 50 21 6

7.28 Scatterplot. Plot the weight of the bar of soap against day. Is the overall patternroughly linear? Based on your scatterplot, is the correlation between day andweight close to 1, positive but not close to 1, close to 0, negative but not close to−1, or close to −1? Explain your answer.

7.29 Regression. Find the equation of the least-squares regression line for predictingsoap weight from day.

(a) What is the equation? Explain what it tells us about the rate at which thesoap lost weight.

(b) Mr. Boggs did not measure the weight of the soap on day 4. Use the regressionequation to predict that weight.

(c) Draw the regression line on your scatterplot from the previous exercise.

7.30 Prediction? Use the regression equation in the previous exercise to predict theweight of the soap after 30 days. Why is it clear that your answer makes no sense?What’s wrong with using the regression line to predict weight after 30 days?

7.31 Statistics for investing. Joe’s retirement plan invests in stocks through an“index fund” that follows the behavior of the stock market as a whole, as measuredby the S&P 500 stock index. Joe wants to buy a mutual fund that does not trackthe index closely. He reads that monthly returns from Fidelity Technology Fundhave correlation r = 0.77 with the S&P 500 index and that Fidelity Real EstateFund has correlation r = 0.37 with the index.

(a) Which of these funds has the closer relationship to returns from the stockmarket as a whole? How do you know?

Page 14: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

180 C H A P T E R 7 • Exploring Data: Part I Review

(b) Does the information given tell Joe anything about which fund has hadhigher returns?

7.32 Initial public offerings. The business magazine Forbes reports that 4567companies sold their first stock to the public between 1990 and 2000. The meanchange in the stock price of these companies since the first stock was issued was+111%. The median change was −31%.11 Explain how this could happen.(Hint: Start with the fact that Cisco Systems stock went up 60,600%.)

7.33 Moving in step? One reason to invest abroad is that markets in differentcountries don’t move in step. When American stocks go down, foreign stocks maygo up. So an investor who holds both bears less risk. That’s the theory. Now weread: “The correlation between changes in American and European share priceshas risen from 0.4 in the mid-1990s to 0.8 in 2000.”12 Explain to an investor whoknows no statistics why this fact reduces the protection provided by buyingEuropean stocks.

7.34 Interpreting correlation. The same article that claims that the correlationbetween changes in stock prices in Europe and the United States was 0.8 in 2000goes on to say: “Crudely, that means that movements on Wall Street can explain80% of price movements in Europe.” Is this true? What is the correct percentexplained if r = 0.8?

7.35 Coaching for the SATs. A study finds that high school students who take theSAT, enroll in an SAT coaching course, and then take the SAT a second timeraise their SAT mathematics scores from a mean of 521 to a mean of 561.13 Whatfactors other than “taking the course causes higher scores” might explain thisimprovement?

S U P P L E M E N T A R Y EXERCISES

Supplementary exercises apply the skills you have learned in ways that require morethought or more elaborate use of technology.

7.36 Change in the Serengeti. Long-term records from the Serengeti National Park4STEPSTEP

in Tanzania show interesting ecological relationships. When wildebeest are moreabundant, they graze the grass more heavily, so there are fewer fires and more treesgrow. Lions feed more successfully when there are more trees, so the lionpopulation increases. Here are data on one part of this cycle, wildebeestabundance (in thousands of animals) and the percent of the grass area thatburned in the same year:14

Gallo Images–Anthony Bannister/Getty Images

Wildebeest Percent Wildebeest Percent Wildebeest Percent(1000s) burned (1000s) burned (1000s) burned

396 56 360 88 1147 32476 50 444 88 1173 31698 25 524 75 1178 24

1049 16 622 60 1253 241178 7 600 56 1249 531200 5 902 451302 7 1440 21

Page 15: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

Supplementary Exercises 181

To what extent do these data support the claim that more wildebeest reduce thepercent of grasslands that burn? How rapidly does burned area decrease as thenumber of wildebeest increases? Include a graph and suitable calculations. Followthe four-step process (page 53) in your answer.

7.37 Prey attract predators. Here is one way in which nature regulates the size of 4STEPSTEP

animal populations: high population density attracts predators, who remove ahigher proportion of the population than when the density of the prey is low. Onestudy looked at kelp perch and their common predator, the kelp bass. Theresearcher set up four large circular pens on sandy ocean bottom in southernCalifornia. He chose young perch at random from a large group and placed 10, 20,40, and 60 perch in the four pens. Then he dropped the nets protecting the pens,allowing bass to swarm in, and counted the perch left after 2 hours. Here are dataon the proportions of perch eaten in four repetitions of this setup:15

Perch Proportion killed

10 0.0 0.1 0.3 0.320 0.2 0.3 0.3 0.640 0.075 0.3 0.6 0.72560 0.517 0.55 0.7 0.817

Do the data support the principle that “more prey attract more predators, whodrive down the number of prey”? Follow the four-step process (page 53) in youranswer.

7.38 Extrapolation. Your work in Exercise 7.36 no doubt included a regression line.Use the equation of this line to illustrate the danger of extrapolation, takingadvantage of the fact that the percent of grasslands burned cannot be less thanzero.

Falling through the ice. The Nenana Ice Classic is an annual contest to guess theexact time in the spring thaw when a tripod erected on the frozen Tanana River nearNenana, Alaska, will fall through the ice. The 2005 jackpot prize was $285,000. Thecontest has been run since 1917. Table 7.4 gives simplified data that record only the dateon which the tripod fell each year. The earliest date so far is April 20. To make the dataeasier to use, the table gives the date each year in days starting with April 20. That is,April 20 is 1, April 21 is 2, and so on. You will need software or a graphing calculatorto analyze these data in Exercises 7.39 to 7.41.16

2006 Bill Watkins/AlaskaStock.com

7.39 When does the ice break up? We have 89 years of data on the date of icebreakup on the Tanana River. Describe the distribution of the breakup date withboth a graph or graphs and appropriate numerical summaries. What is the mediandate (month and day) for ice breakup?

7.40 Global warming? Because of the high stakes, the falling of the tripod has beencarefully observed for many years. If the date the tripod falls has been gettingearlier, that may be evidence for the effects of global warming.

(a) Make a time plot of the date the tripod falls against year.

Page 16: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

182 C H A P T E R 7 • Exploring Data: Part I Review

T A B L E 7 . 4 Days from April 20 for the Tanana River tripod to fall

Year Day Year Day Year Day Year Day Year Day Year Day

1917 11 1932 12 1947 14 1962 23 1977 17 1992 251918 22 1933 19 1948 24 1963 16 1978 11 1993 41919 14 1934 11 1949 25 1964 31 1979 11 1994 101920 22 1935 26 1950 17 1965 18 1980 10 1995 71921 22 1936 11 1951 11 1966 19 1981 11 1996 161922 23 1937 23 1952 23 1967 15 1982 21 1997 111923 20 1938 17 1953 10 1968 19 1983 10 1998 11924 22 1939 10 1954 17 1969 9 1984 20 1999 101925 16 1940 1 1955 20 1970 15 1985 23 2000 121926 7 1941 14 1956 12 1971 19 1986 19 2001 191927 23 1942 11 1957 16 1972 21 1987 16 2002 181928 17 1943 9 1958 10 1973 15 1988 8 2003 101929 16 1944 15 1959 19 1974 17 1989 12 2004 51930 19 1945 27 1960 13 1975 21 1990 5 2005 91931 21 1946 16 1961 16 1976 13 1991 12

(b) There is a great deal of year-to-year variation. Fitting a regression line to thedata may help us see the trend. Fit the least-squares line and add it to your timeplot. What do you conclude?

(c) There is much variation about the line. Give a numerical description of howmuch of the year-to-year variation in ice breakup time is accounted for by thetime trend represented by the regression line.

7.41 More on global warming. Side-by-side boxplots offer a different look at thedata. Group the data into periods of roughly equal length: 1917 to 1939, 1940 to1959, 1960 to 1979, and 1980 to 2005. Make boxplots to compare ice breakupdates in these four time periods. Write a brief description of what the plots show.

7.42 Save the eagles. The pesticide DDT was especially threatening to bald eagles.Here are data on the productivity of the eagle population in northwesternOntario, Canada.17 The eagles nest in an area free of DDT but migrate south andeat prey contaminated with the pesticide. DDT was banned at the end of 1972.The researcher observed every nesting area he could reach every year between1966 and 1981. He measured productivity by the count of young eagles pernesting area.

Ron Sanford/CORBIS

Year Count Year Count Year Count Year Count

1966 1.26 1970 0.54 1974 0.46 1978 0.821967 0.73 1971 0.60 1975 0.77 1979 0.981968 0.89 1972 0.54 1976 0.86 1980 0.931969 0.84 1973 0.78 1977 0.96 1981 1.12

Page 17: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

Supplementary Exercises 183

(a) Make a time plot of the data. Does the plot support the claim that banningDDT helped save the eagles?

(b) It appears that the overall pattern might be described by two straight lines.Find the least-squares line for 1966 to 1972 (pre-ban) and also the least-squaresline for 1975 to 1981 (allowing a few years for DDT to leave the environmentafter the ban). Draw these lines on your plot. Would you use the second line topredict young per nesting area in the several years after 1981?

7.43 Thin monkeys, fat monkeys. Animals and people that take in more energythan they expend will get fatter. Here are data on 12 rhesus monkeys: 6 leanmonkeys (4% to 9% body fat) and 6 obese monkeys (13% to 44% body fat). Thedata report the energy expended in 24 hours (kilojoules per minute) and the leanbody mass (kilograms, leaving out fat) for each monkey.18

Lean Obese

Mass Energy Mass Energy

6.6 1.17 7.9 0.937.8 1.02 9.4 1.398.9 1.46 10.7 1.199.8 1.68 12.2 1.499.7 1.06 12.1 1.299.3 1.16 10.8 1.31

(a) What is the mean lean body mass of the lean monkeys? Of the obesemonkeys? Because animals with higher lean mass usually expend more energy, wecan’t directly compare energy expended.

(b) Instead, look at how energy expended is related to body mass. Make ascatterplot of energy versus mass, using different plot symbols for lean and obesemonkeys. Then add to the plot two regression lines, one for lean monkeys andone for obese monkeys. What do these lines suggest about the monkeys?

7.44 Casting aluminum. In casting metal parts, molten metal flows through a “gate” 4STEPSTEP

into a die that shapes the part. The gate velocity (the speed at which metal isforced through the gate) plays a critical role in die casting. A firm that castscylindrical aluminum pistons examined 12 types formed from the same alloy. Howdoes the cylinder wall thickness (inches) influence the gate velocity (feet persecond) chosen by the skilled workers who do the casting? If there is a clearpattern, it can be used to direct new workers or to automate the process. Analyzethese data and report your findings, following the four-step process.19

Thickness Velocity Thickness Velocity Thickness Velocity

0.248 123.8 0.524 228.6 0.697 145.20.359 223.9 0.552 223.8 0.752 263.10.366 180.9 0.628 326.2 0.806 302.40.400 104.8 0.697 302.4 0.821 302.4

Page 18: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

184 C H A P T E R 7 • Exploring Data: Part I Review

7.45 Weeds among the corn. Lamb’s-quarter is a common weed that interferes withthe growth of corn. An agriculture researcher planted corn at the same rate in16 small plots of ground, then weeded the plots by hand to allow a fixed numberof lamb’s-quarter plants to grow in each meter of corn row. No other weeds wereallowed to grow. Following are the yields of corn (bushels per acre) in each of theplots:20

Weeds Corn Weeds Corn Weeds Corn Weeds Cornper meter yield per meter yield per meter yield per meter yield

0 166.7 1 166.2 3 158.6 9 162.80 172.2 1 157.3 3 176.4 9 142.40 165.0 1 166.7 3 153.1 9 162.80 176.9 1 161.1 3 156.0 9 162.4

Blickwinkel/Alamy

(a) What are the explanatory and response variables in this experiment?

(b) Make side-by-side stemplots of the yields, after rounding to the nearestbushel. Give the median yield for each group (using the unrounded data). Whatdo you conclude about the effect of this weed on corn yield?

7.46 Weeds among the corn, continued. We can also use regression to analyze thedata on weeds and corn yield. The advantage of regression over the side-by-sidecomparison in the previous exercise is that we can use the fitted line to drawconclusions for counts of weeds other than the ones the researcher actually used.

(a) Make a scatterplot of corn yield against weeds per meter. Find theleast-squares regression line and add it to your plot. What does the slope of thefitted line tell us about the effect of lamb’s-quarter on corn yield?

(b) Predict the yield for corn grown under these conditions with 6 lamb’s-quarterplants per meter of row.

E E S E E CASE STUDIESThe Electronic Encyclopedia of Statistical Examples and Exercises (EESEE) is availableon the text CD and Web site. These more elaborate stories, with data, provide settingsfor longer case studies. Here are some suggestions for EESEE stories that apply the ideasyou have learned in Chapters 1 to 6.

7.47 Is Old Faithful Faithful? Write a response to Questions 1 and 3 for this casestudy. (Describing a distribution, scatterplots, and regression.)

7.48 Checkmating and Reading Skills. Write a report based on Question 1 in thiscase study. (Describing a distribution.)

7.49 Counting Calories. Respond to Questions 1, 4, and 6 for this case study.(Describing and comparing distributions.)

7.50 Mercury in Florida’s Bass. Respond to Question 5. (Scatterplots, form ofrelationships. By the way, “homoscedastic” means that the scatter of points about

Page 19: Exploring Data: Part I Reviewvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch7-167-185.pdf · Exploring Data: Part I Review Data analysis is the art of describing data using graphs

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-07 GTBL011-Moore-v16.cls May 22, 2006 9:53

EESEE Case Studies 185

the overall pattern is roughly the same from one side of the scatterplot to theother.)

7.51 Brain Size and Intelligence. Write a response to Question 3. (Scatterplots,correlation, and lurking variables.)

7.52 Acorn Size and Oak Tree Range. Write a report based on Questions 1 and 2.(Scatterplots, correlation, and regression.)

7.53 Surviving the Titanic. Answer Questions 1, 2, and 3. (Two-way tables.)


Recommended