+ All Categories
Home > Documents > 215 Chapter 2

215 Chapter 2

Date post: 07-Oct-2014
Category:
Upload: yumikoyan
View: 383 times
Download: 4 times
Share this document with a friend
40
Chapter Outline 1.1 Populations and Samples 1.2 Sampling a Population of Existing Units 1.3 Sampling a Process 1.4 Ratio, Interval, Ordinal, and Nominative Scales of Measurement (Optional) 1.5 An Introduction to Survey Sampling (Optional) An Introduction to Business Statistics CHAPTER 1 bow77477_ch01.qxd 07/21/2005 11:06 PM Page 2
Transcript
Page 1: 215 Chapter 2

Chapter Outline

1.1 Populations and Samples

1.2 Sampling a Population of Existing Units

1.3 Sampling a Process

1.4 Ratio, Interval, Ordinal, and NominativeScales of Measurement (Optional)

1.5 An Introduction to Survey Sampling(Optional)

AnIntroductionto BusinessStatistics

CH

AP

TE

R 1

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 2

Page 2: 215 Chapter 2

he subject of statistics involves the studyof how to collect, summarize, and interpretdata. Data are numerical facts and figures

from which conclusions can be drawn. Such conclusions are important to the decision-making processes ofmany professions and organizations. For example,government officials use conclusions drawn from thelatest data on unemployment and inflation to makepolicy decisions. Financial planners use recent trendsin stock market prices to make investment decisions.Businesses decide which products to develop andmarket by using data that reveal consumer preferences.Production supervisors use manufacturing data to

evaluate, control, and improve product quality.Politicians rely on data from public opinion polls toformulate legislation and to devise campaignstrategies. Physicians and hospitals use data on theeffectiveness of drugs and surgical procedures toprovide patients with the best possible treatment.

In this chapter we begin to see how we collect andanalyze data. As we proceed through the chapter, weintroduce several case studies. These case studies (andothers to be introduced later) are revisited throughoutlater chapters as we learn the statistical methodsneeded to analyze the cases. Briefly, we will beginto study four cases:

1.1 Populations and Samples Statistical methods are very useful for learning about populations, which can be defined invarious ways. We begin with the following definition:

A population is a set of existing units (usually people, objects, or events).

Examples of populations include (1) all of last year’s graduates of Dartmouth College’s Masterof Business Administration program, (2) all consumers who bought a cellular phone last year,(3) all accounts receivable invoices accumulated last year by The Procter & Gamble Company,(4) all Lincoln Town Cars that were produced last year, and (5) all fires reported last month to theTulsa, Oklahoma, fire department.

We usually focus on studying one or more characteristics of the population units.

Any characteristic of a population unit is called a variable.

For instance, if we study the starting salaries of last year’s graduates of the Dartmouth CollegeMBA program, the variable of interest is starting salary. If we study the gasoline mileages ob-tained in city driving by last year’s Lincoln Town Car, the variable of interest is gasoline mileagein city driving.

We carry out a measurement to assign a value of a variable to each population unit. Forexample, we might measure the starting salary of an MBA graduate to the nearest dollar. Or wemight measure the gasoline mileage obtained by a car in city driving to the nearest one-tenth of amile per gallon by conducting a mileage test on a driving course prescribed by the Environmen-tal Protection Agency (EPA). If the possible measurements are numbers that represent quantities(that is, “how much” or “how many”), then the variable is said to be quantitative. For example,starting salary and gasoline mileage are both quantitative. However, if we simply record intowhich of several categories a population unit falls, then the variable is said to be qualitative orcategorical. Examples of categorical variables include (1) a person’s gender, (2) the make of anautomobile, and (3) whether a person who purchases a product is satisfied with the product.1

T

The Cell Phone Case. A bank estimates its cellularphone costs and decides whether to outsourcemanagement of its wireless resources by studyingthe calling patterns of its employees.

The Marketing Research Case. A bottling companyinvestigates consumer reaction to a new bottledesign for one of its popular soft drinks.

The Coffee Temperature Case. A fast-foodrestaurant studies and monitors the temperatureof the coffee it serves.

The Car Mileage Case. To determine if it qualifiesfor a federal tax credit based on fuel economy, anautomaker studies the gas mileage of its newmidsize model.

C

1Optional Section 1.4 discusses two types of quantitative variables (ratio and interval) and two types of qualitative variables(ordinal and nominative).

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 3

Page 3: 215 Chapter 2

4 Chapter 1 An Introduction to Business Statistics

If we measure each and every population unit, we have a population of measurements(sometimes called observations). If the population is small, it is reasonable to do this. Forinstance, if 150 students graduated last year from the Dartmouth College MBA program, itmight be feasible to survey the graduates and to record all of their starting salaries. In general:

If we examine all of the population measurements, we say that we are conducting a census of thepopulation.

Often the population that we wish to study is very large, and it is too time-consuming or costlyto conduct a census. In such a situation, we select and analyze a subset (or portion) of the popu-lation units.

A sample is a subset of the units in a population.

For example, suppose that 8,742 students graduated last year from a large state university. It wouldprobably be too time-consuming to take a census of the population of all of their starting salaries.Therefore, we would select a sample of graduates, and we would obtain and record their startingsalaries. When we measure the units in a sample, we say that we have a sample of measurements.

We often wish to describe a population or sample.

Descriptive statistics is the science of describing the important aspects of a set of measurements.

As an example, if we are studying a set of starting salaries, we might wish to describe (1) howlarge or small they tend to be, (2) what a typical salary might be, and (3) how much the salariesdiffer from each other.

When the population of interest is small and we can conduct a census of the population, wewill be able to directly describe the important aspects of the population measurements. However,if the population is large and we need to select a sample from it, then we use what we call statis-tical inference.

Statistical inference is the science of using a sample of measurements to make generalizationsabout the important aspects of a population of measurements.

For instance, we might use a sample of starting salaries to estimate the important aspects of apopulation of starting salaries. In the next section, we begin to look at how statistical inference iscarried out.

1.2 Sampling a Population of Existing UnitsRandom samples If the information contained in a sample is to accurately reflect the popu-lation under study, the sample should be randomly selected from the population. To intuitivelyillustrate random sampling, suppose that a small company employs 15 people and wishes to ran-domly select two of them to attend a convention. To make the random selections, we number theemployees from 1 to 15, and we place in a hat 15 identical slips of paper numbered from 1 to 15.We thoroughly mix the slips of paper in the hat and, blindfolded, choose one. The number on thechosen slip of paper identifies the first randomly selected employee. Then, still blindfolded, wechoose another slip of paper from the hat. The number on the second slip identifies the secondrandomly selected employee.

Of course, it is impractical to carry out such a procedure when the population is very large. Itis easier to use a random number table. To show how to use such a table, we must moreformally define a random sample.2

A random sample is selected so that, on each selection from the population, every unit remain-ing in the population on that selection has the same chance of being chosen.

To understand this definition, first note that we can randomly select a sample with or withoutreplacement. If we sample with replacement, we place the unit chosen on any particular selec-tion back into the population. Thus we give this unit a chance to be chosen on any succeeding

MASTER

2Actually, there are several different kinds of random samples. The type we will define is sometimes called a simple randomsample. For brevity’s sake, however, we will use the term random sample.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 4

Page 4: 215 Chapter 2

1.2 Sampling a Population of Existing Units 5

selection. In such a case, all of the units in the population remain as candidates to be chosen foreach and every selection. Randomly choosing two employees with replacement to attend aconvention would make no sense because we wish to send two different employees to the con-vention. If we sample without replacement, we do not place the unit chosen on a particularselection back into the population. Thus we do not give this unit a chance to be selected on anysucceeding selection. In this case, the units remaining as candidates for a particular selection areall of the units in the population except for those that have previously been selected. It is best tosample without replacement. Intuitively, because we will use the sample to learn about the pop-ulation, sampling without replacement will give us the fullest possible look at the population.This is true because choosing the sample without replacement guarantees that all of the units inthe sample will be different (and that we are looking at as many different units from the popula-tion as possible).

In the following example, we illustrate how to use a random number table, or computer-generated random numbers, to select a random sample.

MASTER

Example 1.1 The Cell Phone Case: Estimating Cell Phone Costs3 CBusinesses and college students have at least two things in common—both find cellular phonesto be nearly indispensable because of their convenience and mobility, and both often rack up un-pleasantly high cell phone bills. Students’ high bills are usually the result of overage—a studentuses more minutes than his or her plan allows. Businesses also lose money due to overage, and,in addition, lose money due to underage when some employees do not use all of the (alreadypaid-for) minutes allowed by their plans. Because cellular carriers offer more than 10,000 rateplans, it is nearly impossible for a business to intelligently choose calling plans that will meet itsneeds at a reasonable cost.

Rising cell phone costs have forced companies having large numbers of cellular users to hireservices to manage their cellular and other wireless resources. These cellular management ser-vices use sophisticated software and mathematical models to choose cost efficient cell phoneplans for their clients. One such firm, MobileSense Inc. of Westlake Village, California, special-izes in automated wireless cost management. According to Doug L. Stevens, Vice President ofSales and Marketing at MobileSense, cell phone carriers count on overage and underage todeliver almost half of their revenues. As a result, a company’s typical cost of cell phone use caneasily exceed 25 cents per minute. However, Mr. Stevens explains that by using MobileSenseautomated cost management to select calling plans, this cost can be reduced to 12 cents perminute or less.

In this case we will demonstrate how a bank can use a random sample of cell phone users tostudy its cellular phone costs. Based on this cost information, the bank will decide whether to hirea cellular management service to choose calling plans for the bank’s employees. While the bankhas over 10,000 employees on a variety of calling plans, the cellular management service sug-gests that by studying the calling patterns of cellular users on 500-minute plans, the bank canaccurately assess whether its cell phone costs can be substantially reduced.

The bank has 2,136 employees on a 500-minute-per-month plan with a monthly cost of $50.The overage charge is 40 cents per minute, and there are additional charges for long distance androaming. The bank will estimate its cellular cost per minute for this plan by examining the num-ber of minutes used last month by each of 100 randomly selected employees on this 500-minuteplan. According to the cellular management service, if the cellular cost per minute for the randomsample of 100 employees is over 18 cents per minute, the bank should benefit from automatedcellular management of its calling plans.

In order to randomly select the sample of 100 cell phone users, the bank will make a numberedlist of the 2,136 users on the 500-minute plan. This list is called a frame. The bank can then usea random number table, such as Table 1.1(a), to select the needed sample. To see how this isdone, notice that any single-digit number in the table is assumed to have been randomly selectedfrom the digits 0 to 9. Any two-digit number in the table is assumed to have been randomlyselected from the numbers 00 to 99. Any three-digit number is assumed to have been randomly

3The authors would like to thank Mr. Doug L. Stevens, Vice President of Sales and Marketing, at MobileSense Inc., WestlakeVillage, California, for his help in developing this case.

C H A P T E R 1

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 5

Page 5: 215 Chapter 2

6 Chapter 1 An Introduction to Business Statistics

selected from the numbers 000 to 999, and so forth. Note that the table entries are segmented intogroups of five to make the table easier to read. Because the total number of cell phone users onthe 500-minute plan (2,136) is a four-digit number, we arbitrarily select any set of four digits inthe table (we have circled these digits). This number, which is 0511, identifies the first randomlyselected user. Then, moving in any direction from the 0511 (up, down, right, or left—it does notmatter which), we select additional sets of four digits. These succeeding sets of digits identifyadditional randomly selected users. Here we arbitrarily move down from 0511 in the table. Thefirst seven sets of four digits we obtain are

0511 7156 0285 4461 3990 4919 1915

(See Table 1.1(a)—these numbers are enclosed in a rectangle.) Since there are no users numbered7156, 4461, 3990, or 4919 (remember only 2,136 users are on the 500-minute plan), we ignorethese numbers. This implies that the first three randomly selected users are those numbered 0511,0285, and 1915. Continuing this procedure, we can obtain the entire random sample of 100 users.Notice that, because we are sampling without replacement, we should ignore any set of four digitspreviously selected from the random number table.

While using a random number table is one way to select a random sample, this approach hasa disadvantage that is illustrated by the current situation. Specifically, since most four-digitrandom numbers are not between 0001 and 2136, obtaining 100 different, four-digit randomnumbers between 0001 and 2136 will require ignoring a large number of random numbers inthe random number table, and we will in fact need to use a random number table that is largerthan Table 1.1(a). Although larger random number tables are readily available in books of math-ematical and statistical tables, a good alternative is to use a computer software package, whichcan generate random numbers that are between whatever values we specify. For example,Table 1.1(b) gives the MINITAB output of 100 different, four-digit random numbers that arebetween 0001 and 2136 (note that the “leading 0’s” are not included in these four digit numbers).If used, the random numbers in Table 1.1(b) identify the 100 employees that should form therandom sample.

After the random sample of 100 employees is selected, the number of cellular minutes used byeach employee during the month (the employee’s cellular usage) is found and recorded. The 100cellular-usage figures are given in Table 1.2. Looking at this table, we can see that there is sub-stantial overage and underage—many employees used far more than 500 minutes, while manyothers failed to use all of the 500 minutes allowed by their plan. In Chapter 2 we will use these100 usage figures to estimate the cellular cost per minute for the 500-minute plan.

MASTER

T A B L E 1.1 Random Numbers

(a) A portion of a random number table

33276 85590 79936 56865 05859 90106 7818803427 90511 69445 18663 72695 52180 9032292737 27156 33488 36320 17617 30015 7495285689 20285 52267 67689 93394 01511 8986808178 74461 13916 47564 81056 97735 90707

51259 63990 16308 60756 92144 49442 4071960268 44919 19885 55322 44819 01188 5515794904 01915 04146 18594 29852 71585 6495158586 17752 14513 83149 98736 23495 3574909998 19509 06691 76988 13602 51851 58104

14346 61666 30168 90229 04734 59193 3281274103 15227 25306 76468 26384 58151 4459224200 64161 38005 94342 28728 35806 2285187308 07684 00256 45834 15398 46557 1851007351 86679 92420 60952 61280 50001 94953

(b) MINITAB output of 100 different, four-digitrandom numbers between 1 and 2136

705 1131 169 1703 1709 6091990 766 1286 1977 222 431007 1902 1209 2091 1742 1152111 69 2049 1448 659 338

1732 1650 7 388 613 1477838 272 1227 154 18 320

1053 1466 2087 265 2107 1992582 1787 2098 1581 397 1099757 1699 567 1255 1959 407354 1567 1533 1097 1299 277663 40 585 1486 1021 532

1629 182 372 1144 1569 19811332 1500 743 1262 1759 9551832 378 728 1102 667 1885514 1128 1046 116 1160 1333831 2036 918 1535 660928 1257 1468 503 468

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 6

Page 6: 215 Chapter 2

1.2 Sampling a Population of Existing Units 7

Approximately random samples In general, to take a random sample we must have a list,or frame, of all the population units. This is needed because we must be able to number the pop-ulation units in order to make random selections from them (by, for example, using a randomnumber table). In Example 1.1, where we wished to study a population of 2,136 cell phone userswho were on the bank’s 500-minute cellular plan, we were able to produce a frame (list) of thepopulation units. Therefore, we were able to select a random sample. Sometimes, however, it isnot possible to list and thus number all the units in a population. In such a situation we oftenselect a systematic sample, which approximates a random sample.

MASTER

T A B L E 1.2 A Sample of Cellular Usages (in minutes) for 100 Randomly Selected EmployeesCell Use

75 485 37 547 753 93 897 694 797 477654 578 504 670 490 225 509 247 597 173496 553 0 198 507 157 672 296 774 479

0 822 705 814 20 513 546 801 721 273879 433 420 521 648 41 528 359 367 948511 704 535 585 341 530 216 512 491 0542 562 49 505 461 496 241 624 885 259571 338 503 529 737 444 372 555 290 830719 120 468 730 853 18 479 144 24 513482 683 212 418 399 376 323 173 669 611

Example 1.2 The Marketing Research Case: Rating a New Bottle Design4 CThe design of a package or bottle can have an important effect on a company’s bottom line. Forexample, an article in the September 16, 2004, issue of USA Today reported that the introductionof a contoured 1.5-liter bottle for Coke drinks (including the reduced-calorie soft drink Coke C2)played a major role in Coca-Cola’s failure to meet third-quarter earnings forecasts in 2004.According to the article, Coke’s biggest bottler, Coca-Cola Enterprises, “said it would missexpectations because of the 1.5-liter bottle and the absence of common 2-liter and 12-pack sizesfor C2 in supermarkets.’’5

In this case a brand group is studying whether changes should be made in the bottledesign for a popular soft drink. To research consumer reaction to a new design, the brandgroup will use the “mall intercept method’’6 in which shoppers at a large metropolitanshopping mall are intercepted and asked to participate in a consumer survey. Each shopperwill be exposed to the new bottle design and asked to rate the bottle image. Bottle imagewill be measured by combining consumers’ responses to five items, with each responsemeasured using a 7-point “Likert scale.” The five items and the scale of possible responsesare shown in Figure 1.1. Here, since we describe the least favorable response and the mostfavorable response (and we do not describe the responses between them), we say that thescale is “anchored” at its ends. Responses to the five items will be summed to obtain a com-posite score for each respondent. It follows that the minimum composite score possible is 5and the maximum composite score possible is 35. Furthermore, experience has shown that thesmallest acceptable composite score for a successful bottle design is 25.

In this situation, it is not possible to list and number each and every shopper at the mall whilethe study is being conducted. Consequently, we cannot use random numbers (as we did in thecell phone case) to obtain a random sample of shoppers. Instead, we can select a systematicsample. To do this, every 100th shopper passing a specified location in the mall will be invitedto participate in the survey. Here, selecting every 100th shopper is arbitrary—we could select

4This case was motivated by an example in the book Essentials of Marketing Research by W. R. Dillon, T. J. Madden, and N. H. Firtle (Burr Ridge, IL: Richard D. Irwin, 1993). The authors also wish to thank Professor L. Unger of the Department ofMarketing at Miami University for helpful discussions concerning how this type of marketing study would be carried out.

5Source: “Coke says earnings will come up short”, by Theresa Howard, USA Today, September 16, 2004, p. 801.6This is a commonly used research design. For example, see the Burke Marketing Research website at http://burke.com/about/inc_background.htm, Burke Marketing Research, March 26, 2005.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 7

Page 7: 215 Chapter 2

8 Chapter 1 An Introduction to Business Statistics

every 200th, every 300th, and so forth. By selecting every 100th shopper, it is probably reason-able to believe that the responses of the survey participants are not related. Therefore, it is rea-sonable to assume that the sampled shoppers obtained by the systematic sampling process makeup an approximate random sample.

During a Tuesday afternoon and evening, a sample of 60 shoppers is selected by using thesystematic sampling process. Each shopper is asked to rate the bottle design by responding tothe five items in Figure 1.1, and a composite score is calculated for each shopper. The 60 com-posite scores obtained are given in Table 1.3. Since these scores range from 20 to 35, we mightinfer that most of the shoppers at the mall on the Tuesday afternoon and evening of the studywould rate the new bottle design between 20 and 35. Furthermore, since 57 of the 60 compositescores are at least 25, we might estimate that the proportion of all shoppers at the mall on theTuesday afternoon and evening who would give the bottle design a composite score of at least 25is 57�60 � .95. That is, we estimate that 95 percent of the shoppers would give the bottle designa composite score of at least 25.

In Chapter 2 we will see how to estimate a typical composite score and we will further analyzethe composite scores in Table 1.3.

In some situations, we need to decide whether a sample taken from one population can be em-ployed to make statistical inferences about another, related population. Often logical reasoning isused to do this. For instance, we might reason that the bottle design ratings given by shoppers at themall on the Tuesday afternoon and evening of the research study would be representative of the rat-ings given by (1) shoppers at the same mall at other times, (2) shoppers at other malls, and (3) con-sumers in general. However, if we have no data or other information to back up this reasoning,making such generalizations is dangerous. In practice, marketing research firms choose locationsand sampling times that data and experience indicate will produce a representative cross-section ofconsumers. To simplify our presentation, we will assume that this has been done in the bottledesign case. Therefore, we will suppose that it is reasonable to use the 60 bottle design ratings inTable 1.3 to make statistical inferences about all consumers.

To conclude this section, we emphasize the importance of taking a random (or approximatelyrandom) sample. Statistical theory tells us that, when we select a random (or approximatelyrandom) sample, we can use the sample to make valid statistical inferences about the sampled pop-ulation. However, if the sample is not random, we cannot do this. A classic example occurred priorto the presidential election of 1936, when the Literary Digest predicted that Alf Landon woulddefeat Franklin D. Roosevelt by a margin of 57 percent to 43 percent. Instead, Roosevelt won theelection in a landslide. Literary Digest’s error was to sample names from telephone books and club

MASTER

Strongly StronglyStatement Disagree Agree

The size of this bottle is convenient. 1 2 3 4 5 6 7The contoured shape of this bottle easy to handle. 1 2 3 4 5 6 7The label on this bottle is easy to read. 1 2 3 4 5 6 7This bottle is easy to open. 1 2 3 4 5 6 7Based on its overall appeal, I like this bottle design. 1 2 3 4 5 6 7

Please circle the response that most accurately describes whether you agree or disagree with each statementabout the bottle you have examined.

F I G U R E 1.1 The Bottle Design Survey Instrument

T A B L E 1.3 A Sample of Bottle Design Ratings (Composite Scores for a Systematic Sample of 60 Shoppers)Design

34 33 33 29 26 33 28 25 32 3332 25 27 33 22 27 32 33 32 2924 30 20 34 31 32 30 35 33 3132 28 30 31 31 33 29 27 34 3131 28 33 31 32 28 26 29 32 3432 30 34 32 30 30 32 31 29 33

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 8

Page 8: 215 Chapter 2

1.2 Sampling a Population of Existing Units 9

membership rosters. In 1936 the country had not yet recovered from the Great Depression, andmany unemployed and low-income people did not have phones or belong to clubs. The LiteraryDigest’s sampling procedure excluded these people, who overwhelmingly voted for Roosevelt. Atthis time, George Gallup, founder of the Gallup Poll, was beginning to establish his survey busi-ness. He used an approximately random sample to correctly predict Roosevelt’s victory.

As another example, today’s television and radio stations, as well as newspaper columnists,use voluntary response samples. In such samples, participants self-select—that is, whoeverwishes to participate does so (usually expressing some opinion). These samples overrepresentpeople with strong (usually negative) opinions. For example, the advice columnist Ann Landersonce asked her readers, “If you had it to do over again, would you have children?” Of the nearly10,000 parents who voluntarily responded, 70 percent said that they would not. An approxi-mately random sample taken a few months later found that 91 percent of parents would havechildren again. We further discuss random sampling in optional Section 1.5.

MASTER

Exercises for Sections 1.1 and 1.2CONCEPTS

1.1 Define a population. Give an example of a population that you might study when you start yourcareer after graduating from college.

1.2 Define what we mean by a variable, and explain the difference between a quantitative variable and aqualitative (categorical) variable.

1.3 Below we list several variables. Which of these variables are quantitative and which are qualitative?Explain.a The dollar amount on an accounts receivable invoice.b The net profit for a company in 2005.c The stock exchange on which a company’s stock is traded.d The national debt of the United States in 2005.e The advertising medium (radio, television, or print) used to promote a product.

1.4 Explain the difference between a census and a sample.

1.5 Explain each of the following terms:a Descriptive statistics. c Random sample.b Statistical inference. d Systematic sample.

1.6 Explain why sampling without replacement is preferred to sampling with replacement.

METHODS AND APPLICATIONS

1.7 The Forbes 2000 is a ranking of the world’s biggest companies (measured on a composite ofsales, profits, assets and market values) by the editors of Forbes magazine. Below we give the best performing U.S. companies in the food, drink and tobacco industry from the Forbes 2000as listed on the Forbes magazine website on February 2, 2005. BestPerf

Sales ProfitsCompany (Billions $) (Billions $)

Altria Group (1) 60.7 9.20Coca-Cola (2) 21.0 4.35PepsiCo (3) 27.0 3.49Anheuser-Busch (4) 14.1 2.08Sara Lee (5) 18.7 1.11Coca-Cola Enterprises (6) 17.3 0.68General Mills (7) 10.8 1.00ConAgra Foods (8) 19.8 0.84Archer Daniels (9) 31.7 0.49Kellogg (10) 8.8 0.79HJ Heinz (11) 8.3 0.66

Sales ProfitsCompany (Billions $) (Billions $)

Pepsi Bottling Group (12) 10.3 0.42Tyson Foods (13) 25.2 0.36Campbell Soup (14) 6.9 0.65Dean Foods (15) 9.2 0.36Hershey Foods (16) 4.2 0.46Wm Wrigley Jr (17) 3.1 0.45Constellation Brands (18) 3.3 0.21RJ Reynolds Tobacco (19) 5.3 �3.57UST (20) 1.7 0.49Hormel Foods (21) 4.2 0.19Brown-Forman (22) 2.5 0.25

Source: http://www.forbes.com/lists, February 2, 2005.

Consider the random numbers given in Table 1.1 (page 6). Starting in the upper left corner of Table 1.1 and moving down the two leftmost columns, we see that the first three two-digit numbersobtained are

33 03 92

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 9

Page 9: 215 Chapter 2

10 Chapter 1 An Introduction to Business Statistics

Starting with these three random numbers, and moving down the two leftmost columns of Table 1.1to find more two-digit random numbers, use Table 1.1 to randomly select five of these companies to beinterviewed in detail about their business strategies. Hint: Note that we have numbered the companies inthe Forbes list from 1 to 22.

1.8 Table 1.4 gives the most admired company in each of 30 industries as shown in the 2005 list ofGlobal Most Admired Companies on the Fortune magazine website on March 14, 2005. Starting inthe upper right corner of the random number table of Table 1.1 (page 6) and moving down the tworightmost columns, we see that the first three two-digit numbers obtained are

88 22 52

Starting with these three random numbers, and moving down the two rightmost columns of Table 1.1 to find more two-digit random numbers, use Table 1.1 to randomly select four of theseindustries for further study. MostProf

1.9 THE VIDEO GAME SATISFACTION RATING CASE VideoGame

A company that produces and markets video game systems wishes to assess its customer’s level ofsatisfaction with a relatively new model, the XYZ-Box. In the six months since the introduction ofthe model, the company has received 73,219 warranty registrations from purchasers. The companywill randomly select 65 of these registrations and will conduct telephone interviews with thepurchasers. Specifically, each purchaser will be asked to state his or her level of agreement witheach of the seven statements listed on the survey instrument given in Figure 1.2. Here, the level ofagreement for each statement is measured on a 7-point Likert scale. Purchaser satisfaction will bemeasured by adding the purchaser’s responses to the seven statements. It follows that for eachconsumer the minimum composite score possible is 7 and the maximum is 49. Furthermore,experience has shown that a purchaser of a video game system is “very satisfied” if his or hercomposite score is at least 42.

MASTER

T A B L E 1.4 The Most Admired Company in Each of 30 Industries as Listed on the Fortune magazine website on March 14, 2005

Company Industry

1 Alcoa Metals2 Anheuser-Busch Beverages3 BASF Chemicals4 Berkshire Hathaway Insurance: P & C5 BHP Billiton Mining, Crude-Oil Production6 BP Petroleum Refining7 Cardinal Health Wholesalers: Health Care8 Cisco Systems Network Communications9 Citigroup Megabanks

10 Continental Airlines Airlines11 Walt Disney Entertainment12 FedEx Delivery13 General Electric Electronics14 Home Depot Specialty Retailers15 Illinois Tool Works Industrial & Farm Equipment

Company Industry

16 Intl. Business Machines Computers17 International Paper Forest & Paper Products18 Johnson & Johnson Pharmaceuticals19 Lear Motor Vehicle Parts20 Northwestern Mutual Insurance: Life, Health21 PepsiCo Consumer Food Products22 Procter & Gamble Household and Personal Products23 RWE Energy24 Texas Instruments Semiconductors25 Toyota Motor Motor Vehicles26 United Technologies Aerospace and Defense27 Verizon Communications Telecommunications28 Vinci Engineering, Construction29 Wal-Mart Stores General Merchandisers30 Walgreen Food & Drug Stores

Source: http://www.fortune.com/fortune/globaladmired/subs/2005/number1/0,23180,,00.html, March 14, 2005.

Strongly StronglyStatement Disagree Agree

The game console of the XYZ-Box is well designed. 1 2 3 4 5 6 7The game controller of the XYZ-Box is easy to handle. 1 2 3 4 5 6 7The XYZ-Box has high quality graphics capabilities. 1 2 3 4 5 6 7The XYZ-Box has high quality audio capabilities. 1 2 3 4 5 6 7

The XYZ-Box serves as a complete entertainment center. 1 2 3 4 5 6 7There is a large selection of XYZ-Box games to choose from. 1 2 3 4 5 6 7I am totally satisfied with my XYZ-Box game system. 1 2 3 4 5 6 7

F I G U R E 1.2 The Video Game Satisfaction Survey Instrument

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 10

Page 10: 215 Chapter 2

1.2 Sampling a Population of Existing Units 11

a Assume that the warranty registrations are numbered from 1 to 73,219 in a computer. Startingin the upper left corner of Table 1.1 and moving down the five leftmost columns, we see thatthe first three five-digit numbers obtained are

33276 03427 92737

Starting with these three random numbers and moving down the five leftmost columns of Table 1.1 to find more five-digit random numbers, use Table 1.1 to randomly select the numbersof the first 10 warranty registrations to be included in the sample of 65 registrations.

b Suppose that when the 65 customers are interviewed, their composite scores are obtained and are as given in Table 1.5. Using the data, estimate limits between which most of the 73,219 composite scores would fall. Also, estimate the proportion of the 73,219 composite scores thatwould be at least 42.

1.10 THE BANK CUSTOMER WAITING TIME CASE WaitTime

A bank manager has developed a new system to reduce the time customers spend waiting to beserved by tellers during peak business hours. Typical waiting times during peak business hoursunder the current system are roughly 9 to 10 minutes. The bank manager hopes that the newsystem will lower typical waiting times to less than six minutes.

A 30-day trial of the new system is conducted. During the trial run, every 150th customerwho arrives during peak business hours is selected until a systematic sample of 100 customers isobtained. Each of the sampled customers is observed, and the time spent waiting for teller serviceis recorded. The 100 waiting times obtained are given in Table 1.6. Moreover, the bank managerfeels that this systematic sample is as representative as a random sample of waiting times wouldbe. Using the data, estimate limits between which the waiting times of most of the customersarriving during peak business hours would be. Also, estimate the proportion of waiting times ofcustomers arriving during peak business hours that are less than six minutes.

1.11 In an article titled “Turned Off” in the June 2–4, 1995, issue of USA Weekend, Dan Olmsted andGigi Anders report on the results of a survey conducted by the magazine. Readers were invited towrite in and answer several questions about sex and vulgarity on television. Olmsted and Anderssummarized the survey results as follows:

Nearly all of the 65,000 readers responding to our write-in survey say TV is too vulgar, tooviolent, and too racy. TV execs call it reality.

Some of the key survey results were as follows:

MASTER

T A B L E 1.5 Composite Scores for the Video GameSatisfaction Rating Case VideoGame

39 44 46 44 4445 42 45 44 4238 46 45 45 4742 40 46 44 4342 47 43 46 4541 44 47 4838 43 43 4442 45 41 4146 45 40 4544 40 43 4440 46 44 4439 41 41 4440 43 38 4642 39 43 3945 43 36 41

T A B L E 1.6 Waiting Times (in Minutes) for the BankCustomer Waiting Time Case WaitTime

1.6 6.2 3.2 5.6 7.9 6.1 7.26.6 5.4 6.5 4.4 1.1 3.8 7.35.6 4.9 2.3 4.5 7.2 10.7 4.15.1 5.4 8.7 6.7 2.9 7.5 6.73.9 .8 4.7 8.1 9.1 7.0 3.54.6 2.5 3.6 4.3 7.7 5.3 6.36.5 8.3 2.7 2.2 4.0 4.5 4.36.4 6.1 3.7 5.8 1.4 4.5 3.88.6 6.3 .4 8.6 7.8 1.8 5.14.2 6.8 10.2 2.0 5.2 3.7 5.55.8 9.8 2.8 8.0 8.4 4.03.4 2.9 11.6 9.5 6.3 5.79.3 10.9 4.3 1.3 4.4 2.47.4 4.7 3.1 4.8 5.2 9.21.8 3.9 5.8 9.9 7.4 5.0

• 96% are very or somewhat concerned about SEX on TV.

• 97% are very or somewhat concerned about VULGARLANGUAGE on TV.

• 97% are very or somewhat concerned about VIOLENCEon TV.

Survey Results

Note: Because participants were not chosen at random, the results of the write-in survey may not be scientific.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 11

Page 11: 215 Chapter 2

12 Chapter 1 An Introduction to Business Statistics

a Note the disclaimer at the bottom of the survey results. In a write-in survey, anyone who wishesto participate may respond to the survey questions. Therefore, the sample is not random and wesay that the survey is “not scientific.” What kind of people would be most likely to respond to asurvey about TV sex and violence? Do the survey results agree with your answer?

b If a random sample of the general population were taken, do you think that its results would bethe same? Why or why not? Similarly, for instance, do you think that 97 percent of the generalpopulation is “very or somewhat concerned about violence on TV”?

c Another result obtained in the write-in survey is as follows:• Should “V-chips” be installed on TV sets so parents could easily block violent programming?

YES 90% NO 10%

If you planned to start a business manufacturing and marketing such V-chips (at a reasonable price),would you expect 90 percent of the general population to desire a V-chip? Why or why not?

MASTER

1.3 Sampling a Process A population is not always defined to be a set of existing units. Often we are interested in study-ing the population of all of the units that will be or could potentially be produced by a process.

A process is a sequence of operations that takes inputs (labor, materials, methods, machines, andso on) and turns them into outputs (products, services, and the like).

Processes produce output over time. For example, this year’s Lincoln Town Car manufacturingprocess produces Lincoln Town Cars over time. Early in the model year, Ford Motor Companymight wish to study the population of the city driving mileages of all Lincoln Town Cars that willbe produced during the model year. Or, even more hypothetically, Ford might wish to study thepopulation of the city driving mileages of all Lincoln Town Cars that could potentially be producedby this model year’s manufacturing process. The first population is called a finite populationbecause only a finite number of cars will be produced during the year. Any population of existingunits is also finite. The second population is called an infinite population because the manufac-turing process that produces this year’s model could in theory always be used to build “one morecar.” That is, theoretically there is no limit to the number of cars that could be produced by thisyear’s process. There are a multitude of other examples of finite or infinite hypothetical popula-tions. For instance, we might study the population of all waiting times that will or could potentiallybe experienced by patients of a hospital emergency room. Or we might study the population of allthe amounts of grape jelly that will be or could potentially be dispensed into 16-ounce jars by anautomated filling machine. To study a population of potential process observations, we sample theprocess—usually at equally spaced time points—over time. This is illustrated in the followingcase.

Example 1.3 The Coffee Temperature Case: Monitoring Coffee Temperatures CAccording to the website of the Association of Trial Lawyers of America,7 Stella Liebeck ofAlbuquerque, New Mexico, was severely burned by McDonald’s coffee in February 1992.Liebeck, who received third-degree burns over 6 percent of her body, was awarded $160,000 incompensatory damages and $480,000 in punitive damages. A postverdict investigation revealedthat the coffee temperature at the local Albuquerque McDonald’s had dropped from about 185°Fbefore the trial to about 158° after the trial.

This case concerns coffee temperatures at a fast-food restaurant. Because of the possibility offuture litigation and to possibly improve the coffee’s taste, the restaurant wishes to study andmonitor the temperature of the coffee it serves. To do this, the restaurant personnel measure thetemperature of the coffee being dispensed (in degrees Fahrenheit) at half-hour intervals from10 A.M. to 9:30 P.M. on a given day. Table 1.7 gives the 24 temperature measurements obtained inthe time order that they were observed. Here, time equals 1 at 10 A.M. and 24 at 9:30 P.M.

7http://www.atla.org/consumermediaresources/tier3/press_room/facts/products/mcdonald’s%20coffee.aspx, Association of TrialLawyers of America, January 25, 2005.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 12

Page 12: 215 Chapter 2

MASTER1.3 Sampling a Process 13

Examining Table 1.7, we see that the coffee temperatures range from 152° to 170°. Based onthis, is it reasonable to conclude that the temperature of most of the coffee that will or couldpotentially be served by the restaurant will be between 152° and 170°? The answer is yes if therestaurant’s coffee-making process operates consistently over time. That is, this process must bein a state of statistical control.

A process is in statistical control if it does not exhibit any unusual process variations. Often,this means that the process displays a constant amount of variation around a constant, orhorizontal, level.

To assess whether a process is in statistical control, we sample the process often enough to de-tect unusual variations or instabilities. The fast-food restaurant has sampled the coffee-makingprocess every half hour. In other situations, we sample processes with other frequencies—forexample, every minute, every hour, or every day. In Chapter 14, where we discuss a systematicmethod for studying processes called statistical process control (SPC), we consider how to de-termine the sampling frequency for a process. Using the observed process measurements, we canthen construct a runs plot (sometimes called a time series plot).

A runs plot is a graph of individual process measurements versus time.

Figure 1.3 shows the MINITAB and Excel outputs of a runs plot of the temperature data. (Somepeople call such a plot a line chart when the plot points are connected by line segments as inthe Excel output.) Here we plot each coffee temperature on the vertical scale versus its corre-sponding time index on the horizontal scale. For instance, the first temperature (163°) is plottedversus time equals 1, the second temperature (169°) is plotted versus time equals 2, and soforth. The runs plot suggests that the temperatures exhibit a relatively constant amount of vari-ation around a relatively constant level. That is, the center of the temperatures can be prettymuch represented by a horizontal line (constant level)—see the line drawn through the plottedpoints—and the spread of the points around the line is staying about the same (constant varia-tion). Note that the plot points tend to form a horizontal band. Therefore, the temperatures arein statistical control.

In general, assume that we have sampled a process at different (usually equally spaced) timepoints and made a runs plot of the resulting sample measurements. If the plot indicates that theprocess is in statistical control, and if it is reasonable to believe that the process will remain incontrol, then it is probably reasonable to regard the sample measurements as an approximatelyrandom sample from the population of all possible process measurements. Furthermore, since theprocess is remaining in statistical control, the process performance is predictable. This allows usto make statistical inferences about the population of all possible process measurements that willor potentially could result from using the process. For example, assuming that the coffee-makingprocess will remain in statistical control, it is reasonable to conclude that the temperature of mostof the coffee that will be or could potentially be served will be between 152° and 170°.

To emphasize the importance of statistical control, suppose that another fast-food restaurantobserves the 24 coffee temperatures that are plotted versus time in Figure 1.4. These tempera-tures also range between 152° and 170°. However, we cannot infer from this that the temperatureof most of the coffee that will be or could potentially be served by this other restaurant will be

T A B L E 1.7 24 Coffee Temperatures Observed in Time Order (°F) Coffee

Coffee Coffee CoffeeTime Temperature Time Temperature Time Temperature

(10:00 A.M.) 1 163°F (2:00 P.M.) 9 159°F (6:00 P.M.) 17 158°F

2 169 10 154 18 170

3 156 11 167 19 155

4 152 12 161 20 162

(12:00 noon) 5 165 (4:00 P.M.) 13 152 (8:00 P.M.) 21 156

6 158 14 165 22 167

7 157 15 161 23 155

8 162 16 154 24 164

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 13

Page 13: 215 Chapter 2

14 Chapter 1 An Introduction to Business Statistics

between 152° and 170°. This is because the downward trend in the runs plot of Figure 1.4indicates that the coffee-making process is out of control and will soon produce temperaturesbelow 152°. Another example of an out-of-control process is illustrated in Figure 1.5. Here, thecoffee temperatures seem to fluctuate around a constant level but with increasing variation(notice that the plotted temperatures fan out as time advances). In general, the specific pattern ofout-of-control behavior can suggest the reason for this behavior. For example, the downwardtrend in the runs plot of Figure 1.4 might suggest that the restaurant’s coffeemaker has a defec-tive heating element.

MASTER

F I G U R E 1.3 MINITAB and Excel Runs Plots of Coffee Temperatures: The Process Is in Statistical Control

Tem

pera

ture

190

170

150

130

Constant level

Constantvariationforms ahorizontalband

Time24222018161412108642

Runs (Time Series) Plot of Temperature

(b) The Excel output

TEMP163169156152

158157162159154167161152165

165

A BAAA D E F G H IC

140

150

170

155

165

TEM

P

TIME

1 11 21166

15

12345

7

9

11121314

8

(a) The MINITAB output

TIME

TEM

P

50 10

160

170

150

140

15 20 25

F I G U R E 1.4 A Runs Plot of Coffee Temperatures:The Process Level Is Decreasing

TIME

TEM

P

50 10

160

170

150

140

15 20 25

F I G U R E 1.5 A Runs Plot of Coffee Temperatures:The Process Variation Is Increasing

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 14

Page 14: 215 Chapter 2

1.3 Sampling a Process 15

Visually inspecting a runs plot to check for statistical control can be tricky. One reason is thatthe scale of measurements on the vertical axis can influence whether the data appear to form a hor-izontal band. We will study better methods for detecting out-of-control behavior in Chapter 14.For now, we will simply emphasize that a process must be in statistical control in order to makevalid statistical inferences about the population of all possible process observations. Also, notethat being in statistical control does not necessarily imply that a process is capable of producingoutput that meets our requirements. For example, suppose that marketing research suggests thatthe fast-food restaurant’s customers feel that coffee tastes best if its temperature is between 153°and 167°. Since Table 1.7 indicates that the temperature of some of the coffee it serves is not inthis range (note that two of the temperatures are 152°, one is 169°, and another is 170°), the restau-rant might take action to reduce the variation of the coffee temperatures.

The marketing research, and coffee temperature cases are both examples of using thestatistical process to make a statistical inference. In the next case, we formally describe andillustrate this process.

MASTER

Example 1.4 The Car Mileage Case: Estimating Mileage CIn 2005 the U.S. Department of Energy (DOE) and the Environmental Protection Agency (EPA)emphasized the importance of auto fuel economy. The Fuel Economy Guide, available at the DOEwebsite, discusses the effects of gasoline consumption on U.S. energy security and the economyas follows.8

Buying a more fuel efficient vehicle can help strengthen our national energy security by reduc-ing our dependence on foreign oil. Half of the oil used to produce the gasoline you put in your tankis imported. The United States uses about 20 million barrels of oil per day, two thirds of which is usedfor transportation. Petroleum imports cost us about $2 billion a week—that’s money that could beused to fuel our own economy.

The Guide also discusses the effects of gasoline consumption on global warming:9

Burning fossil fuels such as gasoline or diesel adds greenhouse gasses, including carbondioxide, to the earth’s atmosphere. Greenhouse gasses trap heat and thus warm the earthbecause they prevent a significant proportion of infrared radiation from escaping into space.Vehicles with lower fuel economy burn more fuel, creating more carbon dioxide. Everygallon of gasoline your vehicle burns puts 20 pounds of carbon dioxide into the atmosphere.You can reduce your contribution to global warming by choosing a vehicle with higher fueleconomy.

By choosing a vehicle that achieves 25 miles per gallon rather than 20 miles per gal-lon, you can prevent the release of about 15 tons of greenhouse gas pollution over thelifetime of your vehicle.

In this case study we consider a tax credit offered by the federal government to automakers forimproving the fuel economy of midsize cars. According to The Fuel Economy Guide—2005Model Year, virtually every midsize car equipped with an automatic transmission has an EPAcombined city and highway mileage estimate of 26 miles per gallon (mpg) or less. Furthermore,the EPA has concluded that a 5 mpg increase in fuel economy is significant and feasible.10 There-fore, the government has decided to offer the tax credit to any automaker selling a midsize modelwith an automatic transmission that achieves an EPA combined city and highway mileage esti-mate of at least 31 mpg. To find the combined city and highway mileage estimate for a particularcar model, the EPA tests a sample of cars. The steps used to obtain this estimate are those used inthe statistical process for making a statistical inference:

1 Describe the practical problem of interest and the associated population or process tobe studied. Consider an automaker that has recently introduced a midsize model with anautomatic transmission and wishes to demonstrate that this new model qualifies for the taxcredit. The automaker will study the population of all cars of this type that will be or couldpotentially be produced.

8World Wide Web, http://www.fueleconomy.gov/feg/FEG2005intro.pdf, U.S. Department of Energy, January 25, 2005.9World Wide Web, http://www.fueleconomy.gov/feg/FEG2005intro.pdf, U.S. Department of Energy, January 25, 2005.10The authors wish to thank Jeff Alson of the EPA for this information.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 15

Page 15: 215 Chapter 2

16 Chapter 1 An Introduction to Business Statistics

2 Describe the variable of interest and how it will be measured. The variable of interest isthe EPA combined city and highway mileage of a car. This mileage is obtained by testing thecar on a device similar to a giant treadmill. The device is used to simulate a 7.5-mile citydriving trip and a 10-mile highway driving trip, and the resulting mileages are used tocalculate the EPA combined mileage for the car.11

3 Describe the sampling procedure. The automaker selects a sample of 49 of the newmidsize cars by randomly selecting one car from those produced during each of 49 con-secutive production shifts. Here the sample size (49) is determined by statistical consid-erations to be discussed in Chapter 7. Each sampled car is subjected to the EPA test. Theresulting sample of 49 combined city and highway mileages is given in Table 1.8 (in timeorder).

4 Describe the statistical inference of interest. The sample of 49 mileages will be used toestimate the “typical” EPA combined mileage for the population of all possible new midsizecars. The estimate obtained is the EPA combined city and highway mileage estimate for thenew midsize model.

5 Describe how the statistical inference will be made and evaluate the reliability of theinference. Figure 1.6 gives the MegaStat output of a runs plot of the 49 mileages. The runsplot indicates that the mileages are in statistical control. If it is reasonable to believe that carmileages for this model will remain in control, we can make statistical inferences. Forinstance, because the mileages in Table 1.8 range from 29.8 to 33.3 mpg, we might infer thatmost of the new midsize cars will get combined city and highway mileages between 29.8 and33.3 mpg. To estimate the “typical” EPA combined mileage for the population of all possiblecars, we might visually draw a horizontal line through the “middle” of the plot points in Fig-ure 1.6. When we do this, the horizontal line intersects the vertical axis at about 31.5 mpg.Therefore, we might conclude that the EPA combined city and highway mileage estimate forthe new midsize model should be 31.5 mpg. Since this estimate exceeds the EPA standard of31 mpg, we might also conclude that the automaker qualifies for the tax credit. However, theestimate is intuitive, so we do not have any information about its reliability. In Chapter 2 wewill study more precise ways to both define and estimate a “typical” population value. Thenin Chapters 3 through 7 we will study tools for assessing the reliability of estimation proce-dures and for estimating “with confidence.”

MASTER

11Since the EPA estimates that 55 percent of all driving is city driving, it calculates combined mileage by adding 55 percent ofthe city mileage test result to 45 percent of the highway mileage test result.

T A B L E 1.8 A Sample of 49 Mileages (TimeOrder Is Given by Reading Downthe Columns from Left to Right)

GasMiles

30.8 30.9 32.0 32.3 32.631.7 30.4 31.4 32.7 31.430.1 32.5 30.8 31.2 31.831.6 30.3 32.8 30.6 31.932.1 31.3 32.0 31.7 32.833.3 32.1 31.5 31.4 31.531.3 32.5 32.4 32.2 31.631.0 31.8 31.0 31.5 30.632.0 30.4 29.8 31.7 32.232.4 30.5 31.1 30.6

Runs Plot of Mpg

Mp

g

Time1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

29

30

31

32

33

34

F I G U R E 1.6 MegaStat Output of a Runs Plot of the 49 Mileages

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 16

Page 16: 215 Chapter 2

MASTER1.3 Sampling a Process 17

Exercises for Section 1.3CONCEPTS

1.12 Define a process. Then give an example of a process you might study when you start your careerafter graduating from college.

1.13 Explain what it means to say that a process is in statistical control.

1.14 What is a runs plot? What does a runs plot look like when we sample and plot a process that is instatistical control?

METHODS AND APPLICATIONS

1.15 The data below give 18 measurements of a critical dimension for an automobile part (measure-ments in inches). Here one part has been randomly selected each hour from the previous hour’sproduction, and the measurements are given in time order. AutoPart1

Hour Measurement Hour Measurement

1 3.005 10 3.0052 3.020 11 3.0153 2.980 12 2.9954 3.015 13 3.0205 2.995 14 3.0006 3.010 15 2.9907 3.000 16 2.9858 2.985 17 3.0209 3.025 18 2.985

Construct a runs plot and determine if the process appears to be in statistical control.

1.16 Table 1.9 presents the time (in days) needed to settle the 67 homeowners’ insurance claims handled by an Indiana insurance agent over a year. The claims are given in time order by loss date. ClaimSeta Figure 1.7 shows a MINITAB runs plot of the claims data in Table 1.9. Does the claims-

handling process seem to be in statistical control? Why or why not?

T A B L E 1.9 Number of Days Required to Settle Homeowners’ Insurance Claims (Claims Made from July 2, 2004 to June 25, 2005) ClaimSet

Days to Days to Days toClaim Loss Date Settle Claim Loss Date Settle Claim Loss Date Settle

1 7-2-04 111 24 11-5-04 34 47 3-5-05 702 7-6-04 35 25 11-13-04 25 48 3-5-05 673 7-11-04 23 26 11-21-04 22 49 3-6-05 814 7-12-04 42 27 11-23-04 14 50 3-6-05 925 7-16-04 54 28 11-25-04 20 51 3-6-05 966 7-27-04 50 29 12-1-04 32 52 3-6-05 857 8-1-04 41 30 12-8-04 27 53 3-7-05 838 8-13-04 12 31 12-10-04 23 54 3-7-05 1029 8-20-04 8 32 12-20-04 35 55 3-19-05 23

10 8-20-04 11 33 12-23-04 29 56 3-27-05 1111 8-28-04 11 34 12-31-04 25 57 4-1-05 812 9-3-04 31 35 12-31-04 18 58 4-11-05 1113 9-10-04 35 36 12-31-04 16 59 4-15-05 3514 9-17-04 14 37 1-5-05 23 60 4-19-05 2915 9-18-04 14 38 1-8-05 26 61 5-2-05 8016 9-29-04 27 39 1-16-05 30 62 5-15-05 1817 10-4-04 14 40 1-18-05 36 63 5-25-05 5818 10-6-04 23 41 1-22-05 42 64 6-6-05 419 10-15-04 47 42 1-25-05 45 65 6-12-05 520 10-23-04 17 43 1-27-05 43 66 6-24-05 1521 10-25-04 21 44 2-5-05 39 67 6-25-05 1922 10-30-04 18 45 2-9-05 5323 11-2-04 31 46 2-23-05 64

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 17

Page 17: 215 Chapter 2

18 Chapter 1 An Introduction to Business Statistics

b In March of 2005, Indiana was hit by a widespread ice storm that caused heavy damage tohomes in the area. Did this ice storm have a significant impact on the time needed to settlehomeowners’ claims? Should the agent consider improving procedures for handling claimsin emergency situations? Why or why not?

1.17 In the article “Accelerating Improvement” published in Quality Progress (October 1991),Gaudard, Coates, and Freeman describe a restaurant that caters to business travelers and hasa self-service breakfast buffet. Interested in customer satisfaction, the manager conducts asurvey over a three-week period and finds that the main customer complaint is having towait too long to be seated. On each day from September 11, 1989, to October 1, 1989, aproblem-solving team records the percentage of patrons who must wait more than one minuteto be seated. A runs plot of the daily percentages is shown in Figure 1.8.12 What does the runsplot suggest?

1.18 THE TRASH BAG CASE13 TrashBag

A company that produces and markets trash bags has developed an improved 30-gallonbag. The new bag is produced using a specially formulated plastic that is both stronger and morebiodegradable than previously used plastics, and the company wishes to evaluate the strength ofthis bag. The breaking strength of a trash bag is considered to be the amount (in pounds) of arepresentative trash mix that when loaded into a bag suspended in the air will cause the bag tosustain significant damage (such as ripping or tearing). The company has decided to carry out a 40-hour pilot production run of the new bags. Each hour, at a randomly selected time during thehour, a bag is taken off the production line. The bag is then subjected to a breaking strength test.The 40 breaking strengths obtained during the pilot production run are given in Table 1.10, and anExcel runs plot of these breaking strengths is given in Figure 1.9.a Do the 40 breaking strengths appear to be in statistical control? Explain.b Estimate limits between which most of the breaking strengths of all trash bags would fall.

1.19 THE BANK CUSTOMER WAITING TIME CASE WaitTime

Recall that every 150th customer arriving during peak business hours was sampled until asystematic sample of 100 customers was obtained. This systematic sampling procedure isequivalent to sampling from a process. Figure 1.10 shows a MegaStat runs plot of the 100waiting times in Table 1.6. Does the process appear to be in statistical control? Explain.

MASTER

12The source of Figure 1.9 is M. Gaudard, R. Coates, and L. Freeman, “Accelerating Improvement,” Quality Progress, October1991, pp. 81–88. © 1991 American Society for Quality Control. Used with permission.

13This case is based on conversations by the authors with several employees working for a leading producer of trash bags. Forpurposes of confidentiality, we have withheld the company’s name.

Claim

Day

s to

Set

tle

605040302010

100

75

50

25

0

Time Series Plot of Days to Settle

F I G U R E 1.7 MINITAB Runs Plot of the Insurance ClaimsData for Exercise 1.16

9%

Perc

enta

ge

wh

o w

aite

d

(Sept. 11– Oct. 1, 1989)

M T T F S S M T T S M T W T F S S

8%7%6%5%4%3%2%1%

F I G U R E 1.8 Runs Plot of Daily Percentages ofCustomers Waiting More Than One Minuteto Be Seated (for Exercise 1.17)

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 18

Page 18: 215 Chapter 2

1.4 Ratio, Interval, Ordinal, and Nominative Scales of Measurement (Optional) 19

1.4 Ratio, Interval, Ordinal, and Nominative Scalesof Measurement (Optional)

In Section 1.1 we said that a variable is quantitative if its possible values are numbers that rep-resent quantities (that is, “how much” or “how many”). In general, a quantitative variable is mea-sured on a scale having a fixed unit of measurement between its possible values. For example, ifwe measure employees’ salaries to the nearest dollar, then one dollar is the fixed unit of measure-ment between different employees’ salaries. There are two types of quantitative variables: ratioand interval. A ratio variable is a quantitative variable measured on a scale such that ratios of itsvalues are meaningful and there is an inherently defined zero value. Variables such as salary, height,weight, time, and distance are ratio variables. For example, a distance of zero miles is “no distanceat all,” and a town that is 30 miles away is “twice as far” as a town that is 15 miles away.

An interval variable is a quantitative variable where ratios of its values are not meaningfuland there is not an inherently defined zero value. Temperature (on the Fahrenheit scale) is aninterval variable. For example, zero degrees Fahrenheit does not represent “no heat at all,” justthat it is very cold. Thus, there is no inherently defined zero value. Furthermore, ratios of tem-peratures are not meaningful. For example, it makes no sense to say that 60° is twice as warm as

MASTER

T A B L E 1.10 Trash BagBreaking Strengths

TrashBag

48.5 52.5 50.7 49.452.3 47.5 48.2 51.953.5 50.9 51.5 52.050.5 49.8 49.0 48.850.3 50.0 51.7 46.849.6 50.8 53.2 51.351.0 53.0 51.1 49.348.3 50.9 52.6 54.050.6 49.9 51.2 49.250.2 50.1 49.5 51.4

StrengthA

48.552.353.550.5

49.651.048.350.650.252.547.550.949.8

50.3

A

15

12345

76

91011121314

8

50.050.853.050.949.950.1

1718192021

16

B D E F G H I JC

Runs Plot of Strength

Stre

ng

th

Time

1 6 11 16 21 26 31 36464748495051525354

F I G U R E 1.9 Excel Runs Plot of Breaking Strengths for Exercise 1.18

Runs Plot of Waiting Times

1 11

12

10

8

6

4

2

0

21 31 41 51 61 71 81 91

Customer

Wai

tin

g T

ime

F I G U R E 1.10 MegaStat Runs Plot of Waiting Times for Exercise 1.19

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 19

Page 19: 215 Chapter 2

20 Chapter 1 An Introduction to Business Statistics

30°. In practice, there are very few interval variables other than temperature. Almost all quanti-tative variables are ratio variables.

In Section 1.1 we also said that if we simply record into which of several categories a popula-tion (or sample) unit falls, then the variable is qualitative (or categorical). There are two typesof qualitative variables: ordinal and nominative. An ordinal variable is a qualitative variablefor which there is a meaningful ordering, or ranking, of the categories. The measurements of anordinal variable may be nonnumerical or numerical. For example, a student may be asked to ratethe teaching effectiveness of a college professor as excellent, good, average, poor, or unsatisfac-tory. Here, one category is higher than the next one; that is, “excellent” is a higher rating than“good,” “good” is a higher rating than “average,” and so on. Therefore, teaching effectiveness isan ordinal variable having nonnumerical measurements. On the other hand, if (as is often done)we substitute the numbers 4, 3, 2, 1, and 0 for the ratings excellent through unsatisfactory, thenteaching effectiveness is an ordinal variable having numerical measurements.

In practice, both numbers and associated words are often presented to respondents asked to ratea person or item. When numbers are used, statisticians debate whether the ordinal variable is “some-what quantitative.” For example, statisticians who claim that teaching effectiveness rated as 4, 3, 2,1, or 0 is not somewhat quantitative argue that the difference between 4 (excellent) and 3 (good)may not be the same as the difference between 3 (good) and 2 (average). Other statisticians arguethat as soon as respondents (students) see equally spaced numbers (even though the numbers aredescribed by words), their responses are affected enough to make the variable (teaching effective-ness) somewhat quantitative. Generally speaking, the specific words associated with the numbersprobably substantially affect whether an ordinal variable may be considered somewhat quantita-tive. It is important to note, however, that in practice numerical ordinal ratings are often analyzed asthough they are quantitative. Specifically, various arithmetic operations (as discussed in Chapters 2through 14) are often performed on numerical ordinal ratings. For example, a professor’s teachingeffectiveness average and a student’s grade point average are calculated. In Chapter 15 we will learnhow to use nonparametric statistics to analyze an ordinal variable without considering the vari-able to be somewhat quantitative and performing such arithmetic operations.

To conclude this section, we consider the second type of qualitative variable. A nominativevariable is a qualitative variable for which there is no meaningful ordering, or ranking, of thecategories. A person’s gender, the color of a car, and an employee’s state of residence are nomi-native variables.

MASTER

Exercises for Section 1.4CONCEPTS

1.20 Discuss the difference between a ratio variable and an interval variable.

1.21 Discuss the difference between an ordinal variable and a nominative variable.

METHODS AND APPLICATIONS

1.22 Classify each of the following qualitative variables as ordinal or nominative. Explain your answers.

Qualitative Variable Categories

Statistics course letter grade A B C D FDoor choice on Let’s Make A Deal Door #1 Door #2Television show classifications TV-G TV-PG TV-14 TV-MAPersonal computer ownership Yes NoRestaurant rating ***** **** *** ** *Income tax filing status Married filing jointly Married filing separately

Single Head of household Qualifying widow(er)

1.23 Classify each of the following qualitative variables as ordinal or nominative. Explain your answers.

Qualitative Variable Categories

Personal computer operating system DOS Windows 98 Windows 2000 Windows NT OtherMotion picture classifications G PG PG-13 R NC-17 XLevel of education Elementary Middle school High school College

Graduate schoolRankings of top 10 college 1 2 3 4 5 6 7 8 9 10football teamsExchange on which a stock is traded AMEX NYSE NASDAQ OtherZip code 45056 90015 etc.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 20

Page 20: 215 Chapter 2

1.5 An Introduction to Survey Sampling (Optional) 21

1.5 An Introduction to Survey Sampling (Optional) Random sampling is not the only kind of sampling. Methods for obtaining a sample are calledsampling designs, and the sample we take is sometimes called a sample survey. In this sectionwe explain three sampling designs that are alternatives to random sampling—stratified randomsampling, cluster sampling, and systematic sampling.

One common sampling design involves separately sampling important groups within a popu-lation. Then, the samples are combined to form the entire sample. This approach is the ideabehind stratified random sampling.

In order to select a stratified random sample, we divide the population into nonoverlappinggroups of similar units (people, objects, etc.). These groups are called strata. Then a randomsample is selected from each stratum, and these samples are combined to form the full sample.

It is wise to stratify when the population consists of two or more groups that differ with respectto the variable of interest. For instance, consumers could be divided into strata based on gender,age, ethnic group, or income.

As an example, suppose that a department store chain proposes to open a new store in a locationthat would serve customers who live in a geographical region that consists of (1) an industrial city,(2) a suburban community, and (3) a rural area. In order to assess the potential profitability of theproposed store, the chain wishes to study the incomes of all households in the region. In addition,the chain wishes to estimate the proportion and the total number of households whose memberswould be likely to shop at the store. The department store chain feels that the industrial city, the sub-urban community, and the rural area differ with respect to income and the store’s potential desir-ability. Therefore, it uses these subpopulations as strata and takes a stratified random sample.

Taking a stratified sample can be advantageous because such a sample takes advantage of thefact that units in the same stratum are similar to each other. It follows that a stratified sample canprovide more accurate information than a random sample of the same size. As a simple example,if all of the units in each stratum were exactly the same, then examining only one unit in eachstratum would allow us to describe the entire population. Furthermore, stratification can make asample easier (or possible) to select. Recall that, in order to take a random sample, we must havea frame, or list, of all of the population units. Although a frame might not exist for the overallpopulation, a frame might exist for each stratum. For example, suppose nearly all the householdsin the department store’s geographical region have telephones. Although there might not be a tele-phone directory for the overall geographical region, there might be separate telephone directoriesfor the industrial city, the suburb, and the rural area. Although we do not discuss how to analyzedata from a stratified random sample in the main body of this text, we do so in Appendix F (Part I)on the CD-ROM that accompanies this book. For a more complete discussion of stratified ran-dom sampling, see Mendenhall, Schaeffer, and OH (1986).

Sometimes it is advantageous to select a sample in stages. This is a common practice whenselecting a sample from a very large geographical region. In such a case, a frame often does notexist. For instance, there is no single list of all registered voters in the United States. There is alsono single list of all households in the United States. In this kind of situation, we can use multi-stage cluster sampling. To illustrate this procedure, suppose we wish to take a sample of regis-tered voters from all registered voters in the United States. We might proceed as follows:

Stage 1: Randomly select a sample of counties from all of the counties in the United States.

Stage 2: Randomly select a sample of townships from each county selected in Stage 1.

Stage 3: Randomly select a sample of voting precincts from each township selected in Stage 2.

Stage 4: Randomly select a sample of registered voters from each voting precinct selected inStage 3.

We use the term cluster sampling to describe this type of sampling because at each stage we“cluster” the voters into subpopulations. For instance, in Stage 1 we cluster the voters into coun-ties, and in Stage 2 we cluster the voters in each selected county into townships. Also, notice thatthe random sampling at each stage can be carried out because there are lists of (1) all counties inthe United States, (2) all townships in each county, (3) all voting precincts in each township, and(4) all registered voters in each voting precinct.

MASTER

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 21

Page 21: 215 Chapter 2

22 Chapter 1 An Introduction to Business Statistics

As another example, consider sampling the households in the United States. We might useStages 1 and 2 above to select counties and townships within the selected counties. Then, if thereis a telephone directory of the households in each township, we can randomly sample householdsfrom each selected township by using its telephone directory. Because most households todayhave telephones, and telephone directories are readily available, most national polls are nowconducted by telephone.

It is sometimes a good idea to combine stratification with multistage cluster sampling. Forexample, suppose a national polling organization wants to estimate the proportion of all registeredvoters who favor a particular presidential candidate. Because the presidential preferences of vot-ers might tend to vary by geographical region, the polling organization might divide the UnitedStates into regions (say, Eastern, Midwestern, Southern, and Western regions). The polling orga-nization might then use these regions as strata, and might take a multistage cluster sample fromeach stratum (region).

The analysis of data produced by multistage cluster sampling can be quite complicated. Weexplain how to analyze data produced by one- and two-stage cluster sampling in Appendix F(Part 2) on the CD-ROM that accompanies this book. This appendix also includes a discussion ofan additional survey sampling technique called ratio estimation. For a more detailed discussionof cluster sampling and ratio estimation, see Mendenhall, Schaeffer, and Ott (1986).

In order to select a random sample, we must number the units in a frame of all the populationunits. Then we use a random number table (or a random number generator on a computer) to makethe selections. However, numbering all the population units can be quite time-consuming. More-over, random sampling is used in the various stages of many complex sampling designs (requiringthe numbering of numerous populations). Therefore, it is useful to have an alternative to randomsampling. One such alternative is called systematic sampling. In order to systematically select asample of n units without replacement from a frame of N units, we divide N by n and round theresult down to the nearest whole number. Calling the rounded result �, we then randomly selectone unit from the first � units in the frame—this is the first unit in the systematic sample. The re-maining units in the sample are obtained by selecting every �th unit following the first (randomlyselected) unit. For example, suppose we wish to sample a population of N � 14,327 allergists toinvestigate how often they have prescribed a particular drug during the last year. A medical soci-ety has a directory listing the 14,327 allergists, and we draw a systematic sample of 500 allergistsfrom this frame. Here we compute 14,327�500 � 28.654, which is 28 when rounded down.Therefore, we number the first 28 allergists in the directory from 1 to 28, and we use a randomnumber table to randomly select one of the first 28 allergists. Suppose we select allergist num-ber 19. We interview allergist 19 and every 28th allergist in the frame thereafter, so we chooseallergists 19, 47, 75, and so forth until we obtain our sample of 500 allergists. In this scheme, wemust number the first 28 allergists, but we do not have to number the rest because we can “countoff” every 28th allergist in the directory. Alternatively, we can measure the approximate amountof space in the directory that it takes to list 28 allergists. This measurement can then be used toselect every 28th allergist.

In this book we concentrate on showing how to analyze data produced by random sampling.However, if the order of the population units in a frame is random with respect to the character-istic under study, then a systematic sample should be (approximately) a random sample and wecan analyze the data produced by the systematic sample by using the same methods employed toanalyze random samples. For instance, it would seem reasonable to assume that the alphabeticallyordered allergists in a medical directory would be random (that is, have nothing to do with) thenumber of times the allergists prescribed a particular drug. Similarly, the alphabetically orderedpeople in a telephone directory would probably be random with respect to many of the people’scharacteristics that we might wish to study.

When we employ random sampling, we eliminate bias in the choice of the sample from aframe. However, a proper sampling design does not guarantee that the sample will produce ac-curate information. One potential problem is undercoverage.

Undercoverage occurs when some population units are excluded from the process of selectingthe sample.

This problem occurs when we do not have a complete, accurate list of all the population units. Forexample, although telephone polls today are common, 7 to 8 percent of the people in the UnitedStates do not have telephones. In general, undercoverage usually causes low-income people to be

MASTER

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 22

Page 22: 215 Chapter 2

MASTERChapter Summary 23

underrepresented. If underrepresented groups differ from the rest of the population with respectto the characteristic under study, the survey results will be biased. Another potentially seriousproblem is nonresponse.

Nonresponse occurs when a population unit selected as part of the sample cannot be contactedor refuses to participate.

In some surveys, 35 percent or more of the selected individuals cannot be contacted—even whenseveral callbacks are made. In such a case, other participants are often substituted for the peoplewho cannot be contacted. If the substitute participants differ from the originally selected partici-pants with respect to the characteristic under study, the survey will again be biased. Third, whenpeople are asked potentially embarrassing questions, their responses might not be truthful. Wethen have what we call response bias. Fourth, the wording of the questions asked can influencethe answers received. Slanted questions often evoke biased responses. For example, consider thefollowing question:

Which of the following best describes your views on gun control?

1 The government should take away our guns, leaving us defenseless against heavily armedcriminals.

2 We have the right to keep and bear arms.

This question is biased toward eliciting a response against gun control.

Exercises for Section 1.5CONCEPTS

1.24 When is it appropriate to use stratified random sampling? What are strata, and how should strata beselected?

1.25 When is cluster sampling used? Why do we describe this type of sampling by using the term cluster?

1.26 Explain each of the following terms:a Undercoverage b Nonresponse c Response bias

1.27 Explain how to take a systematic sample of 100 companies from the 1,853 companies that aremembers of an industry trade association.

1.28 Explain how a stratified random sample is selected. Discuss how you might define the strata tosurvey student opinion on a proposal to charge all students a $100 fee for a new university-run bussystem that will provide transportation between off-campus apartments and campus locations.

1.29 Marketing researchers often use city blocks as clusters in cluster sampling. Using this fact, explainhow a market researcher might use multistage cluster sampling to select a sample of consumersfrom all cities having a population of more than 10,000 in a large state having many such cities.

Chapter Summary

This chapter has introduced the idea of using sample data tomake statistical inferences—that is, drawing conclusions aboutpopulations and processes by using sample data. We began bylearning that a population is a set of existing units that we wishto study. We saw that, since many populations are too large toexamine in their entirety, we often study a population by select-ing a sample, which is a subset of the population units. Nextwe learned that, if the information contained in a sample is toaccurately represent the population, then the sample should berandomly selected from the population, and we saw howrandom numbers (obtained from a random number table) canbe used to select a random sample. We also learned that select-ing a random sample requires a frame (that is, a list of all of thepopulation units) and that, since a frame does not always exist, wesometimes select a systematic sample.

We continued this chapter by studying processes. We learnedthat to make statistical inferences about the population of all pos-sible values of a variable that could be observed when using a

process, the process must be in statistical control. We learned thata process is in statistical control if it does not exhibit any unusualprocess variations, and we demonstrated how we might sample aprocess and how to use a runs plot to try to judge whether a processis in control.

Next, in optional Section 1.4 we studied different types ofquantitative and qualitative variables. We learned that there aretwo types of quantitative variables—ratio variables, which aremeasured on a scale such that ratios of its values are meaningfuland there is an inherently defined zero value, and interval vari-ables, for which ratios are not meaningful and there is no inher-ently defined zero value. We also saw that there are two types ofqualitative variables—ordinal variables, for which there is ameaningful ordering of the categories, and nominative vari-ables, for which there is no meaningful ordering of the categories.

We concluded this chapter with optional Section 1.5, whichdiscusses survey sampling. We introduced stratified randomsampling, in which we divide a population into groups (strata)

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 23

Page 23: 215 Chapter 2

24 Chapter 1 An Introduction to Business Statistics

and then select a random sample from each group. We also intro-duced multistage cluster sampling, which involves selecting asample in stages, and we explained how to select a systematic

sample. Finally, we discussed some potential problems encoun-tered when conducting a sample survey—undercoverage, non-response, response bias, and slanted questions.

MASTER

Glossary of Terms

categorical (qualitative) variable: A variable having valuesthat indicate into which of several categories a population unitbelongs. (page 3)census: An examination of all the units in a population. (page 4)cluster sampling (multistage cluster sampling): A samplingdesign in which we sequentially cluster population uints into sub-populations. (page 21)descriptive statistics: The science of describing the importantaspects of a set of measurements. (page 4)finite population: A population that contains a finite number ofunits. (page 12)frame: A list of all of the units in a population. This is needed inorder to select a random sample. (page 5)infinite population: A population that is defined so that there isno limit to the number of units that could potentially belong to thepopulation. (page 12)interval variable: A quantitative variable such that ratios of itsvalues are not meaningful and for which there is not an inherentlydefined zero value. (page 19)measurement: The process of assigning a value of a variable toeach of the units in a population or sample. (page 3)nominative variable: A qualitative variable for which there is nomeaningful ordering, or ranking, of the categories. (page 20)nonresponse: A situation in which population units selected toparticipate in a survey do not respond to the survey instrument.(page 23)ordinal variable: A qualitative variable for which there is ameaningful ordering or ranking of the categories. (page 20)population: A set of existing units (people, objects, events, orthe like) that we wish to study. (page 3)process: A sequence of operations that takes inputs and turnsthem into outputs. (page 12)qualitative (categorical) variable: A variable having valuesthat indicate into which of several categories a population unitbelongs. (page 20)quantitative variable: A variable having values that are num-bers representing quantities. (page 19)random number table: A table containing random digits that isoften used to select a random sample. (page 4)random sample: A sample selected so that, on each selectionfrom the population, every unit remaining in the population onthat selection has the same chance of being chosen. (page 4)

ratio variable: A quantitative variable such that ratios of its val-ues are meaningful and for which there is an inherently definedzero value. (page 19)response bias: A situation in which survey participants do notrespond truthfully to the survey questions. (page 23)runs plot: A graph of individual process measurements versustime. (page 13)sample: A subset of the units in a population. (page 4)sampling without replacement: A sampling procedure inwhich we do not place previously selected units back into thepopulation and, therefore, do not give these units a chance to bechosen on succeeding selections. (page 5)sampling with replacement: A sampling procedure in which weplace any unit that has been chosen back into the population togive the unit a chance to be chosen on succeeding selections.(page 4)statistical control: A state in which a process does not exhibitany unusual variations. Often this means that the process displaysa uniform amount of variation around a constant, or horizontal,level. (page 13)statistical inference: The science of using a sample of measure-ments to make generalizations about the important aspects of apopulation. (page 4)statistical process control (SPC): A method for analyzingprocess data in which we monitor and study the process variation.The goal is to stabilize (and reduce) the amount of process varia-tion. (page 13)strata: The subpopulations in a stratified sampling design.(page 21)stratified random sampling: A sampling design in which wedivide a population into nonoverlapping subpopulations andthen select a random sample from each subpopulation (stratum).(page 21)systematic sample: A sample taken by moving systematicallythrough the population. For instance, we might randomly selectone of the first 200 population units and then systematically sam-ple every 200th population unit thereafter. (page 7)undercoverage: A situation in sampling in which some groupsof population units are underrepresented. (page 22)variable: A characteristic of a population unit. (page 3)

1.30 Some television stations attempt to gauge public opinion by posing a question on the air andasking viewers to call to give their opinions. Suppose that a particular television station asksviewers whether they support or oppose a proposed federal gun control law. Viewers are to call oneof two 800 numbers to register support or opposition. When the results are tabulated, the stationreports that 78 percent of those who called are opposed to the proposed law. What do you think ofthe sampling method used by the station? Do you think that the percentage of the entire populationthat opposes the proposed law is as high as the 78 percent of the sample that was opposed?

1.31 In early 1995, The Milwaukee Sentinel, a morning newspaper in Milwaukee, Wisconsin, and TheMilwaukee Journal, an afternoon newspaper, merged to form The Milwaukee Journal Sentinel.Several weeks after the merger, a Milwaukee television station, WITI-TV, conducted a telephone

Supplementary Exercises

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 24

Page 24: 215 Chapter 2

MASTERSupplementary Exercises 25

call-in survey asking whether viewers liked the new Journal Sentinel. The survey was “notscientific” because any viewer wishing to call in could do so.

On April 26, 1995, Tim Cuprisin, in his “Inside TV & Radio” column in the Journal Sentinel,wrote the following comment:

WE DIDN’T CALL: WITI-TV (Channel 6) did one of those polls—which they admit areunscientific—last week and found that 388 viewers like the new Journal Sentinel and 2,629don’t like it.

We did our own unscientific poll on whether those Channel 6 surveys accurately reflectpublic opinion. The results: a full 100 percent of the respondents say absolutely, positively not.

Is Cuprisin’s comment justified? Write a short paragraph explaining your answer.

1.32 Table 1.11 gives the “35 best companies to work for” as rated on the Fortune magazine websiteon March 14, 2005. Use random numbers to select a random sample of 10 of these companies.Justify that your sample is random by carefully explaining how you obtained it. List the randomnumbers you used and show how they gave your random sample.

1.33 A bank wishes to study the amount of time it takes to complete a withdrawal transaction fromone of its ATMs (automated teller machines). On a particular day, 63 withdrawal transactions areobserved between 10 A.M. and noon. The time required to complete each transaction is given inTable 1.12. Figure 1.11 shows an Excel runs plot of the 63 transaction times. Do the transactiontimes seem to be in statistical control? Why or why not? ATMTime

T A B L E 1.12 ATM Transaction Times (in Seconds) for 63 Withdrawals ATMTime

Transaction Time Transaction Time Transaction Time1 32 22 34 43 372 32 23 32 44 323 41 24 34 45 334 51 25 35 46 335 42 26 33 47 406 39 27 42 48 357 33 28 46 49 338 43 29 52 50 399 35 30 36 51 34

10 33 31 37 52 3411 33 32 32 53 3312 32 33 39 54 3813 42 34 36 55 4114 34 35 41 56 3415 37 36 32 57 3516 37 37 33 58 3517 33 38 34 59 3718 35 39 38 60 3919 40 40 32 61 4420 36 41 35 62 4021 32 42 33 63 39

ATM TIME32324151

3933433533333242

42

12345

76

91011121314

8

A B D E F G H IC

6050403020100

ATM

TIM

E

0 10 20 30 40 50 60

TRANSACTION NUMBER

F I G U R E 1.11 Excel Runs Plot of ATM Transaction Times for Exercise 1.33

Rank Company

1 Wegmans FoodMarkets

2 W.L. Gore3 Republic

Bancorp4 Genentech5 Xilinx6 J.M. Smucker7 S.C. Johnson &

Son8 Griffin Hospital9 Alston & Bird

10 Vision ServicePlan

11 Starbucks12 Quicken Loans13 Adobe Systems14 CDW15 Container Store16 SAS Institute17 Qualcomm18 Robert W. Baird19 QuikTrip20 HomeBanc

Mortgage21 David Weekley

Homes22 TD Industries23 Valero Energy24 Network

Appliance25 JM Family

Enterprises26 American

CenturyInvestments

27 Cisco Systems28 American Cast

Iron Pipe29 Stew Leonard’s30 Whole Foods

Market31 Baptist Health

South Florida32 Arnold &

Porter33 Amgen34 American

FidelityAssurance

35 Goldman SachsGroup

Source: World Wide Web,http://www.fortune.com/fortune/bestcompanies/subs/2005/fulllist/0,22864,,00.html, March 14,2005.

T A B L E 1.11Fortune’s 35 BestCompanies to Workfor in March 2005(for Exercise 1.32)

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 25

Page 25: 215 Chapter 2

26 Chapter 1 An Introduction to Business Statistics

1.34 Figure 1.12 gives a runs plot of the Cleveland Indians’ winning percentages from 1915 (when theteam was renamed as the “Indians”) to 2004. Many longtime Indians fans believe that the April1959 trade of Rocky Colavito, a feared home-run hitter, for Detroit’s Harvey Kuehn, a goodaverage hitter without exceptional power, sent the team into a decline that lasted more than 30 years. Does the runs plot provide any evidence to support this opinion? Why or why not?

1.35 THE TRASH BAG CASE TrashBag

Recall that the company will carry out a 40-hour pilot production run of the new bags and willrandomly select one bag each hour to be subjected to a breaking strength test.a Explain how the company can use random numbers to randomly select the times during

the 40 hours of the pilot production run at which bags will be tested. Hint: Suppose that arandomly selected time will be determined to the nearest minute.

b Use the following random numbers (obtained from Table 1.1) to select the times during the firstfive hours at which the first five bags to be tested will be taken from the production line: 61, 15,64, 07, 86, 87, 57, 64, 66, 42, 59, 51.

MASTER

0.75

0.65

0.55

0.45

WINPCT

1920 1940 19601930 1950 1970 1990 20001980

0.35

F I G U R E 1.12 Runs Plot of the Cleveland Indians’ Winning Percentages from 1915 through 2004 (for Exercise 1.34)

MINITAB, Excel, and MegaStat for StatisticsIn this book we use three types of software to carryout statistical analysis—MINITAB, Excel, and MegaStat.MINITAB is a computer package designed expressly forconducting statistical analysis. It is widely used at manycolleges and universities, and in a large number of busi-ness organizations. Excel is, of course, a general purposeelectronic spreadsheet program and analytical tool. Theanalysis ToolPak in Excel includes many procedures forperforming various kinds of basic statistical analyses.MegaStat is an add-in package that is specifically de-signed for performing statistical analysis in the Excelspreadsheet environment. The principal advantage ofExcel is that, because of its broad acceptance among stu-dents and professionals as a multipurpose analyticaltool, it is both well known and widely available. The ad-vantage of a special-purpose statistical software pack-age like MINITAB is that it provides a far wider range ofstatistical procedures and it offers the experienced ana-lyst a range of options to better control the analysis. Theadvantages of MegaStat include (1) its ability to performa number of statistical calculations that are not auto-matically done by the procedures in the Excel ToolPak,

and (2) features that make it easier to use than Excel fora wide variety of statistical analyses. In addition, theoutput obtained by using MegaStat is automaticallyplaced in a standard Excel spreadsheet and can be editedby using any of the features in Excel. MegaStat can becopied from the CD-ROM included with this book.MINITAB, Excel, and MegaStat, through built-in func-tions, programming languages, and macros, offer almostlimitless power. Here, we will limit our attention to pro-cedures that are easily accessible via menus withoutresort to any special programming or advanced features.

Commonly used features of MINITAB, Excel, andMegaStat are presented in this chapter along with aninitial application—to produce a time series or runs plot.You will find that the limited instructions includedhere, along with the built-in help features of all threesoftware packages, will serve as a starting point fromwhich you can discover a variety of other proceduresand options. Much more detailed descriptions ofMINITAB can be found in other sources, in particular inthe manual Meet MINITAB: Release 14 for Windows.This manual is available in print and as a .pdf file,

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 26

Page 26: 215 Chapter 2

MASTERAppendix 1.1 Getting Started With MINITAB 27

viewable using Adobe Acrobat Reader, on the MINITABInc. website (http://www.minitab.com/products/minitab/14/documentation/aspx). Similarly, there are a numberof alternative reference materials for Microsoft Excel. Ofcourse, an understanding of the related statistical

concepts is essential to the effective use of any statisticalsoftware package.

The instructions in this book are based on MINITAB(Version 14) for Windows and Microsoft Excel 2003 asfound in Microsoft Office 2003.

Appendix 1.1 ■ Getting Started With MINITABWe begin with a look at some features of MINITAB thatare common to most analyses. When the instructions callfor a sequence of selections from a series of menus, thesequence will be presented in the following form:

Stat : Basic Statistics : Descriptive Statistics

This notation indicates that Stat is the first selectionfrom the Minitab menu bar, next Basic Statistics is

selected from the Stat pull-down menu, and finallyDescriptive Statistics is selected from the Basic Statisticspull-down menu.

Starting MINITAB Procedures for starting MINITABmay vary from one installation to the next. If youare using a public computing laboratory, you mayhave to consult local documentation. For typicalMINITAB installations, you will generally be ableto start MINITAB with a sequence of selectionsfrom the Microsoft Windows Start menu some-thing like the following:

• Select Start : Programs : MINITAB 14 forWindows : Minitab 14

You can also start MINITAB with a previously savedMINITAB worksheet (like Coffee.mtw or one of themany other data files from the CD-ROM includedwith this text) from the Windows Explorer bydouble-clicking on the worksheet’s icon.

After you start MINITAB, the display is partitionedinto two working windows. These windows servethe following functions:

• The “Session” window is the area whereMINITAB commands and basic output aredisplayed.

• The “Data” window is a spreadsheet-styledisplay where data can be entered and edited.

Help resources Like most Windows programs,MINITAB includes on-line help via a Help Menu.The Help feature includes standard Contents andSearch entries as well as Tutorials that introduceMINITAB concepts and walk through some typicalMINITAB sessions. Also included is a StatGuide thatprovides guidance for interpreting statistical tablesand graphs in a practical, easy-to-understand way.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 27

Page 27: 215 Chapter 2

MASTER28 Chapter 1 An Introduction to Business Statistics

Entering data (entering the coffee temperaturedata in Table 1.7) from the keyboard:

• In the Data window, click on the cell directlybelow C1 and type a name for the variable,Temp, and press the Enter key.

• Starting in line 1 under column C1, type thevalues for the variable (coffee temperaturesfrom Table 1.7 on page 13) down the column,pressing the Enter key after each number istyped.

Saving data (saving the coffee temperature data):

• Select File : Save Current Worksheet As.

• In the “Save Worksheet As” dialog box, usethe “Save in” pull-down menu to selectthe destination drive and folder. (Here wehave selected the floppy drive, A:)

• Enter the desired file name in the File namebox. (Here we have chosen the nameCoffee. Minitab will automatically add the extension .mtw.)

• Click Save in the “Save Worksheet As”dialog box.

Retrieving a Minitab worksheet containing thecoffee temperature data in Table 1.7 (data file:Coffee.mtw):

• Select File : Open Worksheet

• In the Open Worksheet dialog box, use the“Look in” pull-down menu to select thesource drive and folder. (Here we haveselected the floppy drive, A:)

• Select the desired file name in the File namebox. (Here we have chosen the Minitabworksheet Coffee.Mtw)

• Click Open in the Open Worksheet dialog box.

• Minitab may display a dialog box with themessage, “A copy of the content of this filewill be added to the current project.” If so,click OK.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 28

Page 28: 215 Chapter 2

MASTERAppendix 1.1 Getting Started With MINITAB 29

Creating a runs (or time series) plot similar toFigure 1.3(a) on page 14 (data file: Coffee.mtw):

• Select Graph : Time Series Plot.

• In the Time Series Plots dialog box, selectSimple, which produces a time series plot ofdata in a single column. Click OK in the TimeSeries Plots dialog box.

• In the “Time Series Plot—Simple” dialog box,enter the name of the variable, Temp, intothe Series box. Do this either by typing itsname, by clicking on its name in the variablesbox on the left and then clicking on the Selectbutton, or by double-clicking on its name inthe variables box on the left.

• Click OK in the “Time Series Plot—Simple”dialog box.

• The runs (or time series) plot will appear in agraphics window.

• The graph can be edited by right-clicking onthe portion you wish to edit. For instance,here we have right-clicked on the data region.Selecting “Edit Data Region” from the pop upwindow yields a dialog box that allows you toedit this region. The x and y scales, x and yaxis labels, title, plot symbols, connectinglines, data region, figure region, and so forthcan all be edited by right-clicking on thatparticular portion of the graph.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 29

Page 29: 215 Chapter 2

MASTER30 Chapter 1 An Introduction to Business Statistics

• For instance, here we have right-clicked onthe data region and then selected “Edit DataRegion” from the pop up menu. The Edit DataRegion dialog box allows us to edit variousattributes of this region. As shown, selectingCustom and clicking on the Background Colorarrow allows us to change the backgroundcolor of the data region.

Printing a high-resolution graph similar toFigure 1.3(a) on page 14 (data file: Coffee.mtw):

• Click in the graphics window to select it as theactive window.

• Select File : Print Graph to print the graph.

• Select the appropriate printer and click OK inthe Print dialog box.

• When finished with the graph window, youcan close it by clicking on the �× button in theupper right corner of the graphics window.

Saving the high-resolution graph:

• Select File : Save Graph As.

• In the “Save Graph As” dialog box, use the“Save in” pull-down menu to select thedestination drive and folder (here we haveselected the floppy drive, A:).

• Enter the desired file name in the File namebox (here we have chosen the name Coffee).MINITAB will automatically add the extension .MGF.

• Click Save in the “Save Graph As” dialog box.

bow77477_ch01.qxd 07/21/2005 11:06 PM Page 30

Page 30: 215 Chapter 2

MASTERAppendix 1.1 Getting Started With MINITAB 31

Printing data from the Session (shown) or Datawindow (data file: Coffee.mtw):

To print selected text from the Session window:

• Use the mouse to select the desired text.

• Select File : Print Session Window.

• In the Print dialog box, note that the Printrange “Selection” option is selected. To printthe entire Session window, select Print range“All”.

• Click OK in Print dialog box.

To print a Minitab worksheet from the datawindow:

• Click in the Data window to select it as active.

• Select File : Print Worksheet.

• Click OK in the Data Window Print Optionsdialog box.

• Click OK in the Print dialog box.

Including MINITAB output in reports The immediatelypreceding examples show how to print various types ofoutput directly from MINITAB. Printing is a useful wayto capture a quick hard-copy record of an analysisresult. However, you may prefer at times to collect selec-ted analysis results and arrange them with related nar-rative documentation in a report that can be saved andprinted as a unit. This is easily accomplished by copying

selected MINITAB results to the Windows clipboard andpasting them into your favorite word processor. Oncecopied to a word processor document, MINITAB resultscan be documented, edited, resized, and rearrangedas desired into a cohesive record of your analysis. Thefollowing sequence of examples illustrates the processof collecting MINITAB output into a Microsoft Worddocument.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 31

Page 31: 215 Chapter 2

32 Chapter 1 An Introduction to Business Statistics

MASTER

Copying session window output to a wordprocessor document:

• Be sure to have a word processing documentopen to receive the results.

• Use the scroll bar on the right side of theSession window to locate the results to becopied and drag the mouse to select thedesired output (selected output will bereverse-highlighted in black).

• Copy the selected output to the Windowsclipboard by clicking the Copy icon on theMINITAB toolbar.

• Switch to your word processor document byclicking the button on the Windows task bar(here labeled MS Word Report.doc...).

• Click in your word processing document toposition the cursor at the desired insertionpoint.

• Click the Paste button on the word processorpower bar.

• Return to your MINITAB session by clicking theMINITAB button on the Windows task bar.

Copying high-resolution graphics output to a wordprocessor document:

• Be sure to have a word processing documentopen to receive the results.

• Copy the selected contents of the high-resolution graphics window to the Windowsclipboard by right-clicking in the graphicswindow and then clicking Copy Graph onthe pop up menu.

• Switch to your word processor document byclicking the button on the Windows task bar(here labeled MS Word Report.doc...).

• Click in your word processing document toposition the cursor at the desired insertionpoint.

• Click the Paste button on the word processorpower bar.

• Return to your MINITAB session by clicking theMINITAB button on the Windows task bar.

Results Here is how the copied results mightappear in Microsoft Word. These results can beedited, resized, repositioned, and combined withyour own additional documentation to create acohesive record of your analysis.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 32

Page 32: 215 Chapter 2

MASTERAppendix 1.2 Getting Started with Excel 33

We complete this appendix by demonstrating howMINITAB can be used to generate a set of randomnumbers (as in Example 1.1 on page 5).

To create 100 random numbers between 1 and 2136similar to those in Table 1.1 (page 6):

• Select Calc : Random Data : Integer

• In the Integer Distribution dialog box,enter 100 into the “Generate rows of data”window.

• Enter C1 into the “Store in column(s)” window.

• Enter 1 into the Minimum value box andenter 2136 into the Maximum value box.

• Click OK in the Integer Distribution dialogbox.

The 100 random numbers will be placed in theData Window in column C1. These numbers aregenerated with replacement. Repeated numberswould be skipped if the random numbers arebeing used to sample without replacement.

Appendix 1.2 ■ Getting Started with ExcelWe begin with a look at some features of Excel that arecommon to many analyses. When the instructions callfor a sequence of selections from a series of menus, thesequence will be presented in the following form:

Tools : Data Analysis : Descriptive Statistics

This notation indicates that Tools is the first selectionfrom the Excel menu bar, next Data Analysis is selectedfrom the Tools pull-down menu, and finally DescriptiveStatistics is selected from the Data Analysis menu window.

For many of the statistical and graphical procedures inExcel, it is necessary to provide a range of cells to specifythe location of data in the spreadsheet. Generally, therange may be specified either by typing the cell locationsdirectly into a dialog box or by dragging the selectedrange with the mouse. Though, for the experienceduser, it is usually easier to use the mouse to select arange, the instructions that follow will, for precision andclarity, specify ranges by typing in cell locations. Theselected range may include column or variable labels—labels at the tops of columns that serve to identify

variables. When the selected range includes suchlabels, it is important to select the “Labels checkbox” in the analysis dialog box.

Starting Excel Procedures for starting Excel mayvary from one installation to the next. If you areusing a public computing laboratory, you may wishto consult local documentation. For typical Excelinstallations, you will generally be able to startExcel with a sequence of selections from theMicrosoft Windows Start menu something like thefollowing:

• Select Start : Microsoft Office XP : MicrosoftOffice Excel 2003

You can also start Excel with a previously savedExcel spreadsheet (like Coffee.xls or one of theother data files from the CD-ROM included withthis text) from the Windows Explorer by double-clicking on the spreadsheet file’s icon.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 33

Page 33: 215 Chapter 2

MASTER34 Chapter 1 An Introduction to Business Statistics

After starting Excel, the display will generallyshow a blank Excel workbook.

Help resources Like most Windows programs,Excel includes on-line help via a Help Menu thatincludes an Excel help wizard as well as a Contentsand Index entry.

Entering data (entering the coffee temperaturedata in Table 1.7) from the keyboard (data file:Coffee.xls):

• In a new Excel workbook, click on cell A1 inSheet1 and type a label—that is, a variablename—(here, Temp) for the coffeetemperatures.

• Beginning in cell A2 (directly under thecolumn label Temp) type the values for thevariable (coffee temperatures from Table 1.7on page 13) down the column, pressing theEnter key following each entry.

Saving data (saving the coffee temperature data):

• Select File: Save As.

• In the “Save As” dialog box, use the “Save in”drop-down menu to select the destinationdrive and folder (here we have selected thefloppy drive, A:).

• Enter the desired file name in the “File name”box (here we have chosen the name Coffee).Excel will automatically add the extension .xls.

• Click Save in the “Save As” dialog box.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 34

Page 34: 215 Chapter 2

MASTERAppendix 1.2 Getting Started with Excel 35

Retrieving an Excel spreadsheet containing thecoffee temperature data in Table 1.7 on page 13(data file: Coffee.xls):

• Select File : Open

• In the Open dialog box, use the “Look in”drop-down menu to select the source driveand folder (here we have selected the floppydrive, A:).

• Select the desired file name in the “Filename” box (here we have chosen the Excelspreadsheet Coffee.xls).

• Click the Open button in the Open dialog box.

Creating a runs plot similar to Figure 1.3 (b) onpage 14 (data file: Coffee.xls):

• Click the Chart Wizard button on the Exceltoolbar.

• In the “Chart Wizard—Step 1 of 4” dialogbox, click on Line in the Chart type menu andclick Next>.

• In the Step 2 dialog box, be sure that therange in the Data Range box is the correctrange for the data and label. If not, enter thecorrect range (here A1:A25) in the Data Rangebox.

• Under the “Series in” option, select Columnsand click Next>.

• In the Step 3 dialog box, click Next>.

• In the Step 4 dialog box, click Finish.

The graph can be repositioned on the spreadsheetby dragging it with the mouse. The graph can beresized by dragging the sizing handles aroundthe edge of the graph. Many formatting optionsare available by right-clicking various graphelements.

Printing a spreadsheet with embedded graphsimilar to Figure 1.3 (b) (data file: Coffee.xls):

• Click outside the graph for printing to includethe data and the graph. (Click on the graph toselect only the graph for printing.)

• Select File : Print.

• Click OK in the Print dialog box.

There are many print options available in Excel forprinting—a selected range, selected sheets, or anentire workbook—making it possible to build andprint fairly sophisticated reports directly fromExcel.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 35

Page 35: 215 Chapter 2

MASTER36 Chapter 1 An Introduction to Business Statistics

Including Excel output in reports The preceding exam-ple showed how to print selected analysis results fromExcel. Printing is a useful way to capture a quick hard-copy record of an analysis result and Excel offers a varietyof options for building sophisticated reports. However,you may at times prefer to collect selected analysis re-sults and arrange them with related narrative in a wordprocessing document that can be saved and printed as a

unit. Simply copy Excel results, selected spreadsheetranges and graphs, to the Windows clipboard. Thenpaste them into an open word processing document.Once copied to a word processor document, Excel resultscan be documented, edited, resized, and rearranged asdesired into a cohesive record of your analysis. The cutand paste process is quite similar to the Minitab exam-ples at the end of Appendix 1.1.

Appendix 1.3 ■ Getting Started with MegaStatMegaStat, which was developed by Professor J. B. Orrisof Butler University, is an Excel add-in that performsstatistical analyses within an Excel workbook. After it isinstalled, it appears on the Excel menu and works like

any other Excel option. Instructions for installing Mega-Stat can be found on the CD-ROM that accompaniesthis text.

When you click on MegaStat on the Excel menubar, the MegaStat menu appears as shown to theright. Most of the menu options display submenus.If a menu item is followed by an ellipsis (...), click-ing it will display a dialog box for that option.

A dialog box allows you to specify the data tobe used and other inputs and options. A typicaldialog box is shown to the right.

After you have selected the needed data andoptions, click OK. The dialog box disappears, andMegaStat performs the analysis. Before we look atspecific dialog boxes, we will describe some fea-tures that are common to all of the options.MegaStat use is intuitive and very much like otherExcel operations; however, there are some fea-tures unique to MegaStat.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 36

Page 36: 215 Chapter 2

MASTERAppendix 1.3 Getting Started with MegaStat 37

1 Pointing and dragging with the mouse. Simplyselect the desired data by pointing to it, left-clickingon the first data item, and dragging the cursor toselect the rest of the data as illustrated above.

Since the dialog box pops up on the screen, itmay block some of your data. You can move adialog box around on the screen by placing themouse pointer over the title bar (colored area atthe top) and then clicking and holding the leftmouse button while dragging the dialog box toa new location. You can even drag it partially offthe screen.

You will also notice that when you startselecting data by dragging the mouse pointer, thedialog box will collapse to a smaller size to helpyou see the underlying data. It will automaticallyreturn to full size when you release the mousebutton. You can also collapse and uncollapse thedialog box manually by clicking the collapse (-)button at the right end of the field. Clicking thebutton again will uncollapse the dialog box.(Never use the �× button to try to collapse oruncollapse a dialog box.)

2 Using MegaStat’s AutoExpand feature. Pointingand dragging to select data can be tedious if youhave a lot of data. When you drag the mousedown it is easy to overshoot the selection, and

then you have to drag the mouse back until youget the area correctly selected. AutoExpand allowsyou to select data rapidly without having to dragthrough the entire column of data. Here is how itworks:

a Make sure the input box has the focus (that is,click in it to make the input box active). Aninput box has the focus when the insertionpointer is blinking in it.

b Click in one cell of the column you want. Ifmore than one column is being selected, dragthe mouse across the columns.

c Right-click over the input field or left-click thelabel “Input range” to the right of the inputbox. The data range will expand to include allof the rows in the region where you selectedone row.

This procedure is illustrated on the next page.In the left screenshot, we have left-clicked on onecell in the column of data labeled WaitTime. In theright screenshot, we see the result after we right-click over the input field or left-click on the label“Input range.” Notice that the entire column ofdata has been selected in the right screenshot.This can be seen by examining the input field orby looking at the column of data.

Data selection Most MegaStat dialog boxes have fieldswhere you select input ranges that contain the datato be used. Such a field is shown in the dialog box just

illustrated—it is the long horizontal window with thelabel “Input range” to its right. Input ranges can beselected using four methods:

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 37

Page 37: 215 Chapter 2

MASTER38 Chapter 1 An Introduction to Business Statistics

With a little practice you find this is a veryefficient way to select data. The only time youcannot use it is when you want to use a partialcolumn of data. You should also be aware thatthe AutoExpand stops when it finds a blank cell;thus any summations or other calculations at thebottom of a column would be selected.

Note: When using these methods of data selec-tion, you may select variables in an alternatesequence by holding the CTRL key while makingmultiple selections.

3 Typing the name of a named range. If you havepreviously identified a range of cells using Excel’sname box, you may use that name to specify adata range in a MegaStat dialog box. This methodcan be very useful if you are using the same datafor several different statistical procedures.

4 Typing a range address. You may type any validExcel range address—for example, A1:A101—intothe input field. This is the most cumbersome wayto specify data ranges, but it certainly works.

Data labels For most procedures, the first cell in eachinput range can be a label. If the first cell in a range istext, it is considered a label; if the first cell is a numericvalue, it is considered data. If you want to use numbersas variable labels, you must enter the numbers as text bypreceding them with a single quote mark—for instance,‘2. Even though Excel stores times and dates as numbers,MegaStat will recognize them as labels if they are for-matted as time/date values. If data labels are not part ofthe input range, the program automatically uses the cellimmediately above the data range as a label if it con-tains a text value. If an option can consider the entirefirst row (or column) of an input range as labels, any nu-meric value in the row will cause the entire row to betreated as data. Finally, if the program detects sequentialintegers (1,2,3,...) in a location where you might wantlabels, it will display a warning message. Otherwise, the

rule is that text cells are labels, and numeric cells aredata.

Output When you click OK in a MegaStat dialog box, itperforms some statistical analysis and needs a place toput its output. It looks for a worksheet named Output inthe current Excel application. If it finds one, it goes tothe end of it and appends its output; if it doesn’t find anOutput worksheet, it creates one. MegaStat neverchanges the user’s worksheets; it only sends output to itsOutput sheet.

MegaStat attempts to format the output, but remem-ber that the Output sheet is just a standard Excelworksheet and can be modified in any way. You can ad-just column widths and change any formatting that youthink needs improvement. You can insert, delete, andmodify cells. You can copy all or part of the output toanother worksheet or to another application such as aword processor.

When the program generates output, it adjusts col-umn widths for the current output. If you have previousoutput from a different option already in the Outputsheet, the column widths for the previous output may bealtered. You can attempt to fix this by manually adjust-ing the column widths. Alternatively, you can make it apractice to always start a new Output sheet. The Utilitiesmenu has options for deleting the Output sheet, formaking a copy of it, and for starting a new one.

An example We now give an example of using Mega-Stat to carry out statistical analysis. When the instruc-tions call for a sequence of selections from a series ofmenus, the sequence will be presented in the followingform:

MegaStat: Probability: Counting Rules

This notation says that MegaStat is the first selectionfrom the Excel menu bar; next Probability is selectedfrom the MegaStat pull-down menu; and finally Count-ing Rules is selected from the Probability submenu.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 38

Page 38: 215 Chapter 2

MASTERAppendix 1.3 Getting Started with MegaStat 39

Creating a runs plot similar to Figure 1.6 onpage 16 (data file: GasMiles.xls):

• In the Descriptive Statistics dialog box, enterthe mileage data in Table 1.8 on page 16 intocolumn A with the label MPG in cell A1 andwith the 49 mileages in cells A2 through A50.

• Select MegaStat : Descriptive Statistics.

• Enter the range A1: A50 into the Input rangebox. The easiest way to do this is to use theMegaStat AutoExpand feature. Simply selectone cell in column A (cell A4, for instance) byclicking on the cell. Then, either right-click inthe Input range box or left-click on the label“Input range” to the right of the Input rangebox.

• Check the Runs Plot check box.

• Click OK in the Descriptive statistics dialogbox.

MegaStat places the resulting analysis (in this casethe runs plot) in an output worksheet. This is astandard Excel worksheet, which can be editedusing any of the usual Excel features.

For instance, by right-clicking on various por-tions of the runs plot graphic, you can edit the plotin many ways. Here we have right-clicked on thechart area. By now selecting Format Chart Area,we are able to edit the graphic in a variety ofways.

In the Format Chart Area dialog box, we can addcolor to the runs plot, change fonts, and edit theplot in many other ways.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 39

Page 39: 215 Chapter 2

MASTER40 Chapter 1 An Introduction to Business Statistics

When we right-click on the chart area and obtainthe previously shown pull-down menu, we can al-ternatively select Chart Options. The Chart Op-tions dialog box allows us to edit chart titles, axesand axis labels, data labels, and many other chartfeatures.

Our final edited runs plot as presented in the text is as follows.

We complete this appendix by demonstrating howMegaStat can be used to generate a set of randomnumbers (as in Example 1.1 on page 5).

To create 100 random numbers between 1 and 2136similar to those in Table 1.1 (page 6):

• Select MegaStat : Generate Random Numbers …

• In the Random Number Generation dialogbox, enter 100 into the “Number of values tobe generated” window.

• Click the right arrow button to select 0 DecimalPlaces.

• Select the Uniform tab, and enter 1 into theMinimum box and enter 2136 into theMaximum box.

• Click OK in the Random Number Generationdialog box.

The 100 random numbers will be placed in the Out-put Sheet. These numbers are generated with re-placement. Repeated numbers would be skipped ifthe random numbers are being used to samplewithout replacement.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 40

Page 40: 215 Chapter 2

MASTERAppendix 1.4 Introduction to Internet Exercises 41

Appendix 1.4 ■ Introduction to Internet ExercisesThe Internet and the World Wide Web provide a richsource of data and information on a limitless variety ofsubjects, among them government statistics, data aboutfirms, and data about consumers. Though you probablyhave experience using the Internet, in this book we willuse the Internet as a source of data for statistical analy-sis. We do this through Internet exercises in many chap-ters of this book. These exercises will ask you to go to awebsite, find appropriate data (and perhaps copy it) andthen analyze the data.

The BSC: The McGraw-Hill/Irwin Business Statistics CenterThe BSC is a collection of Internet-based resources forteaching and learning about business statistics. It includeslinks to statistical publications, websites, software, anddata sources. To go to the BSC, enter the Web addresshttp://www.mhhe.com/business/opsci/bstat/ into yourbrowser. There are also links to the BSC on the CD-ROM in-cluded with this text. To see a list of databases accessiblethrough the BSC, click on the Data Bases button on the BSChome screen. Updated links to data and additional Inter-net exercises can also be found at this text’s website—http://www.mhhe.com/business/opsci/bowerman/.

1.30 Internet ExerciseThe website maintained by the U.S. Census Bureau pro-vides a multitude of social, economic, and governmentdata. In particular, this website houses selected datafrom the most recent Statistical Abstract of the UnitedStates (http://www.census.gov/statab/www/). Amongthese selected features are “Frequently RequestedTables” that can be accessed simply by clicking on thelabel.

a Go to the U.S. Census Bureau website and open the“Frequently requested tables” from the StatisticalAbstract. Find the table of “Consumer Price Indexes

by Major Groups.” (Note that in Chapter 17 weexplain how price indexes are constructed.)Construct runs plots of (1) the price index for allitems over time (years), (2) the price index for foodover time, (3) the price index for fuel oil over time,and (4) the price index for electricity over time. Foreach runs plot, describe apparent trends in theprice index.

b By opening the “Frequently requested tables” fromthe Statistical Abstract, find the table of “Crimesand Crime Rates, by Type.” Repeat the analysis ofpart (a) for each of (1) total violent crime rate (per100,000 population) over time, (2) murder rate (per100,000 population) over time, and (3) robbery rate(per 100,000 population) over time.

1.31 Internet ExerciseThe website maintained by Fortune magazine (http://www.fortune.com) offers a wide selection of interestingbusiness data. Included is the famous Fortune 500 List aswell as many other company lists (America’s Most Ad-mired Companies, The Fortune Global 500, etc.).

a Go to the Fortune website and find the Fortune500 list. Consider the first 50 companies (ranked 1through 50 by revenue) and select a randomsample of 10 of the top 50 companies, using eithera random number table (Table 1.1 on page 6) orthe sampling capabilities of Microsoft Excel. Justifythat your sample is a random sample by explainingexactly how the sample was selected.

b Go to the Fortune website and find the FortuneGlobal 500 list. Select a systematic sample of10 companies from this list. Explain exactly howthe sample was selected.

bow77477_ch01.qxd 07/21/2005 11:07 PM Page 41


Recommended