Methods for Describing Sets of Data Chapter 2
2.2 a. To find the frequency for each class, count the number of times each letter occurs. The
frequencies for the three classes are:
Class Frequency X 8 Y 9 Z 3
Total 20 b. The relative frequency for each class is found by dividing the frequency by the total sample
size. The relative frequency for the class X is 8/20 = .40. The relative frequency for the class Y is 9/20 = .45. The relative frequency for the class Z is 3/20 = .15.
Class Frequency Relative Frequency
X 8 .40 Y 9 .45 Z 3 .15
Total 20 1.00 c. The frequency bar chart is: d. The pie chart for the frequency distribution is:
Methods for Describing Sets of Data 5
2.4 a. The variable summarized in the table is ‘Reason for requesting the installation of the passenger-side on-off switch.’ The values this variable could assume are: Infant, Child, Medical, Infant & Medical, Child & Medical, Infant & Child, and Infant & Child & Medical. Since the responses name something, the variable is qualitative.
b. The relative frequencies are found by dividing the number of requests for each category by
the total number of requests. For the category ‘Infant’, the relative frequency is 1,852/30,337 = .061. The rest of the relative frequencies are found in the table below:
Reason Number of
Requests Relative
frequencies Infant 1,852 1,852/30,337 .061
Child 17,148 17,148/30,337 .565
Medical 8,377 8,377/30,337 .276
Infant & Medical 44 44/30,337 .0014
Child & Medical 903 903/30,337 .030
Infant & Child 1,878 1,878/30,337 .062
Infant & Child & Medical 135 135/30,337 .0045
TOTAL 30,337 .9999
c. Using MINITAB, a pie chart of the data is:
Medical ( 8377, 27.6%)
Inf ant&Medic ( 44, 0.1%)
Inf ant&Child ( 1878, 6.2%)
Inf ant ( 1852, 6.1%)
Inf &Chd&Med ( 135, 0.4%)Child&Medica ( 903, 3.0%)
Child (17148, 56.5%)
Pie Chart of Reason
d. There are 4 categories where Medical is mentioned as a reason: Medical, Infant & Medical, Child & Medical, and Infant & Child & Medical. The sum of the frequencies for these 4 categories is 8,377 + 44 + 903 + 135 = 9,459. The proportion listing Medical as one of the reasons is 9,459/30,337 = .312.
6 Chapter 2
2.6 a. To find relative frequencies, we divide the frequencies of each category by the total number of incidents. The relative frequencies of the number of incidents for each of the cause categories are:
Management System Cause Category
Number of Incidents Relative Frequencies
Engineering & Design 27 27 / 83 = .325 Procedures & Practices 24 24 / 83 = .289 Management & Oversight 22 22 / 83 = .265 Training & Communication 10 10 / 83 = .120 TOTAL 83 1
b. The Pareto diagram is:
Category
Pe
rce
nt
Trn&C ommMgmt&O v erProc&PractEng&Des
35
30
25
20
15
10
5
0
Management Systen Cause Category
c. The category with the highest relative frequency of incidents is Engineering and Design. The category with the lowest relative frequency of incidents is Training and Communication.
2.8 a. The data collection method was a survey.
b. Since the data were numbers (percentage of US labor and materials), the variable is quantitative. Once the data were collected, they were grouped into 4 categories.
Methods for Describing Sets of Data 7
c. Using MINITAB, a pie chart of the data is:
<50% ( 4, 3.8%)
75-99% (20, 18.9%)
50-74% (18, 17.0%)
100% (64, 60.4%)
Pie Chart of Made in USA
About 60% of those surveyed believe that “Made in USA” means 100% US labor and materials. 2.10 Using MINITAB, a bar chart of the frequency of occurrence of the industry types is:
INDUSTRY
Coun
t
Util
ities
Tran
spor
tatio
nTe
leco
mm
unic
atio
nsTe
chno
logy
Equ
ipm
ent
Soft
war
e &
Ser
vice
sSe
rvic
es/S
uppl
ies
Sem
icon
duct
ors
Ret
ailin
gO
il &
Gas
Med
iaM
ater
ials
Insu
ranc
eH
ouse
hold
/Per
sona
l Pro
duct
sH
otel
s/R
esta
uran
ts/L
eisu
reH
ealth
Car
eFo
od/D
rink/
Toba
cco
Food
Mar
kets
Dru
gs/B
iote
chno
logy
Div
ersi
fied
Fina
ncia
lsC
onsu
mer
Dur
able
sC
onst
ruct
ion
Con
glom
erat
esC
hem
ical
sC
apita
l Goo
dsBa
nkin
gA
eros
pace
/Def
ense
80
70
60
50
40
30
20
10
0
Chart of INDUSTRY
8 Chapter 2
2.12 Using MINITAB, the side-by-side bar charts are:
Unathor ized Use of COmputer Systems
Re
lati
ve
Fre
qu
en
cy
Don't knowNoYes
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Don't knowNoYes
1999 2006
Chart of 1999, 2006 vs Use
The relative frequency of unauthorized use of computer systems has decreased from 1999 to 2006.
2.14 a. Using MINITAB, the side-by-side graphs are:
Stars
Fre
qu
en
cy
2345
16
12
8
4
0
2345
16
12
8
4
0
Exposure Opportunity
Content Faculty
Chart of Exposure, Opportunity, Content, Faculty vs Stars
From these graphs, one can see that very few of the top 30 MBA programs got 5-stars in
any criteria. In addition, about the same number of programs got 4 stars in each of the 4 criteria. The biggest difference in ratings among the 4 criteria was in the number of programs receiving 3-stars. More programs received 3-stars in Course Content than in any of the other criteria. Consequently, fewer programs received 2-stars in Course Content than in any of the other criteria.
b. Since this chart lists the rankings of only the top 30 MBA programs in the world, it is
reasonable that none of these best programs would be rated as 1-star on any criteria.
Methods for Describing Sets of Data 9
2.16 2.18 a. The original data set has 1 + 3 + 5 + 7 + 4 + 3 = 23 observations. b. For the bottom row of the stem-and-leaf display: The stem is 0. The leaves are 0, 1, 2. The numbers in the original data set are 0, 1, and 2. c. The dot plot corresponding to all the data points is:
2.20. a. The measurement class that contains the highest proportion of respondents is “none”.
Sixty-one percent of the respondents said that their companies did not outsource any computer security functions.
b. From the graph, 6% of the respondents indicated that they outsourced between 20% and
40% of their computer security functions.
c. The proportion of the 609 respondents who outsourced at least 40% of computer security functions is .04 + .01 + .01 = .06.
d. The number of the 609 respondents who outsourced less than 20% of computer security
functions is (.27 + .61)*609 = .88(609) = 536.
10 Chapter 2
2.22 a. Using MINITAB, the stem-and-leaf display of the data is: Stem-and-Leaf Display: SCORE Stem-and-leaf of SCORE N = 169 Leaf Unit = 1.0 1 6 2 1 6 2 7 2 3 7 8 4 8 4 15 8 66677888899 56 9 00001111111222222222233333333344444444444 (100) 9 55555555555555555555556666666666666666666777777777777777777888888+ 13 10 0000000000000
b. From the stem-and-leaf display, we see that there are only 4 observations with sanitation scores less than the acceptable score of 86. The proportion of ships that have an accepted sanitation standard would be (169 – 4) / 169 = .976.
c. The sanitation score of 84 is in bold in the stem-and-leaf display in part a.
2.24 a. Using MINITAB, the frequency histogram is:
50403020
30
20
10
0
Length
Freq
uenc
y
Methods for Describing Sets of Data 11
b. Using MINITAB, the frequency histogram is:
2502000150010005000
35
30
25
20
15
10
5
0
Weight
Freq
uenc
y
c. Using MINITAB, the frequency histogram is:
10005000
140
120
100
80
60
40
20
0
DDT
Freq
uenc
y
2.26 Using MINITAB, the two dot plots are:
Dotplot for Arrive-Depart
Yes. Most of the numbers of items arriving at the work center per hour are in the 135 to 165 area. Most of the numbers of items departing the work center per hour are in the 110 to 140 area. Because the number of items arriving is larger than the number of items departing, there will probably be some sort of bottleneck.
12 Chapter 2
2.28 a. Using MINITAB, the three frequency histograms are as follows (the same starting point and class interval were used for each):
Histogram of C1 N = 25 Tenth Performance Midpoint Count 4.00 0 8.00 0 12.00 1 * 16.00 5 ***** 20.00 10 ********** 24.00 6 ****** 28.00 0 32.00 2 ** 36.00 0 40.00 1 * Histogram of C2 N = 25 Thirtieth Performance Midpoint Count 4.00 1 * 8.00 9 ********* 12.00 12 ************ 16.00 2 ** 20.00 1 * Histogram of C3 N = 25 Fiftieth Performance Midpoint Count 4.00 3 *** 8.00 15 *************** 12.00 4 **** 16.00 2 ** 20.00 1 * b. The histogram for the tenth performance shows a much greater spread of the observations
than the other two histograms. The thirtieth performance histogram shows a shift to the left—implying shorter completion times than for the tenth performance. In addition, the fiftieth performance histogram shows an additional shift to the left compared to that for the thirtieth performance. However, the last shift is not as great as the first shift. This agrees with statements made in the problem.
Methods for Describing Sets of Data 13
2.30 a. A stem-and-leaf display is as follows, where the stems are the units place and the leaves are the decimal places:
Stem Leaves
1 0 0 0 0 1 1 2 2 222 3 4 4 4 4444 5 5 55 6 79
2 1 144 6 7 9 9
3 0 028 9 9
4 1112 5
5 24
6
7 8
8
9
10 1
b. A little more than half (26/49 = .53) of all companies spent less than 2 months in bankruptcy. Only two of the 49 companies spent more than 6 months in bankruptcy. It appears then, in general, the length of time in bankruptcy for firms using "prepacks" is less than that of firms not using "prepacks."
c. A dot diagram will be used to compare the time in bankruptcy for the three types of "prepack" firms:
d. The circled times in part a correspond to companies that were reorganized through a leverage buyout. There does not appear to be any pattern to these points. They appear to be scattered about evenly throughout the distribution of all times.
2.32 Using MINITAB, the stem-and-leaf display for the data is:
Stem-and-leaf of Time N = 25 Leaf Unit = 1.0
3 3 239 7 4 3499 (7) 5 0011469 11 6 34458 6 7 13 4 8 6 2 2 9 5 1 10 2
The numbers in bold represent delivery times associated with customers who subsequently did not place additional orders with the firm. Since there were only 2 customers with delivery times of 68 days or longer that placed additional orders, I would say the maximum tolerable delivery time is about 65 to 67 days. Everyone with delivery times less than 67 days placed additional orders.
14 Chapter 2
2.34 a. x∑ = 3 + 8 + 4 + 5 + 3 + 4 + 6 = 33 b. 2x∑ = 32 + 82 + 42 + 52 + 32 + 42 + 62 = 175 c. = (3 − 5)2( 5)x −∑ 2 + (8 − 5)2 + (4 − 5)2 + (5 − 5)2 + (3 − 5)2 + (4 − 5)2
+ (6 − 5)2 = 20 d. = (3 − 2)2( 2)x −∑ 2 + (8 − 2)2 + (4 − 2)2 + (5 − 2)2 + (3 − 2)2 + (4 − 2)2
+ (6 − 2)2 = 71 e. ( )2
x∑ = (3 + 8 + 4 + 5 + 3 + 4 + 6)2 = 332 = 1089 2.36 a. x∑ = 6 + 0 + (−2) + (−1) + 3 = 6 b. 2x∑ = 62 + 02 + (−2)2 + (−1)2 + 32 = 50
c. ( )2
2
5x
x − ∑∑ = 50 − 26
5 = 50 − 7.2 = 42.8
2.38 a. 8510
xx
n= =∑ = 8.5
b. 40016
x = = 25
c. 3545
x = = .78
d. 24218
x = = 13.44
2.40 The median is the middle number once the data have been arranged in order. If n is even, there
is not a single middle number. Thus, to compute the median, we take the average of the middle two numbers. If n is odd, there is a single middle number. The median is this middle number.
A data set with five measurements arranged in order is 1, 3, 5, 6, 8. The median is the middle
number, which is 5. A data set with six measurements arranged in order is 1, 3, 5, 5, 6, 8. The median is the average
of the middle two numbers which is 5 5 102 2+
= = 5.
Methods for Describing Sets of Data 15
2.42 a. x = 7 46 6
xn
+ += =∑ 15 = 2.5
Median = 3 32
+ = 3 (mean of 3rd and 4th numbers, after ordering)
Mode = 3
b. x = 2 413 13
x n
+ +=∑ 40
= = 3.08
Median = 3 (7th number, after ordering) Mode = 3
c. x = 51 37 49610 10
xn
+ +=∑ = = 49.6
Median = 48 502
+ = 49 (mean of 5th and 6th numbers, after ordering)
Mode = 50 2.44 a. The sample mean is:
1 529 355 301 ... 63 3757 144.526 26
n
ii
xx
n= + + + +
= = = =∑
The sample median is found by finding the average of the 13th and 14th observations once the data are arranged in order. The 13th and 14th observations are 100 and 105. The average of these two numbers (median) is:
100 105 205median 102.52 2+
= = =
The mode is the observation appearing the most. For this data set, the mode is 70, which appears 3 times. Since the mean is larger than the median, the data are skewed to the right.
b. The sample mean is:
1 11 9 6 ... 4 136 5.2326 26
n
ii
xx
n= + + + +
= = = =∑
The sample median is found by finding the average of the 13th and 14th observations once the data are arranged in order. The 13th and 14th observations are 5 and 5. The average of these two numbers (median) is:
5 5 10median 52 2+
= = =
16 Chapter 2
The mode is the observation appearing the most. For this data set, the mode is 6, which appears 6 times. Since the mean and median are about the same, the data are somewhat symmetric.
2.46 a. The sample mean is:
1 1.72 2.50 2.16 1.95 37.62 1.88120 20
n
ii
xx
n= + + + ⋅ ⋅ ⋅ +
= = = =∑
The sample average surface roughness of the 20 observations is 1.881.
b. The median is found as the average of the 10th and 11th observations, once the data have been ordered. The ordered data are:
1.06 1.09 1.19 1.26 1.27 1.40 1.51 1.72 1.95 2.03 2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.50 2.57 2.64
The 10th and 11th observations are 2.03 and 2.05. The median is:
2.03 2.05 4.08 2.042 2+
= =
The middle surface roughness measurement is 2.04. Half of the sample measurements
were less than 2.04 and half were greater than 2.04.
c. The data are somewhat skewed to the left. Thus, the median might be a better measure of central tendency than the mean. The few small values in the data tend to make the mean smaller than the median.
2.48 a. Using MINITAB, the stem-and-leaf display is: Stem-and-leaf of PAF N=17 Leaf Unit = 1.0
6 0 000009 8 1 25 (2) 2 45 7 3 13 5 4 0 4 5 4 6 2 3 7 057 b. The median is the middle number once the data are arranged in order. The data arranged in
order are: 0, 0, 0, 0, 0, 9, 12, 15, 24, 25, 31, 33, 40, 62, 70, 75, 77.
The middle number or the median is 24.
c. The mean of the data is x = x
n∑ = 77 33 75 31
17 + + + + = 473
17 = 27.82
Methods for Describing Sets of Data 17
d. The number occurring most frequently is 0. The mode is 0. e. The mode corresponds to the smallest number. It does not seem to locate the center of the
distribution. Both the mean and the median are in the middle of the stem-and-leaf display. Thus, it appears that both of them locate the center of the data.
2.50 a. The sample mean length is:
1 42.5 44.0 41.5 ... 36.0 6165 42.81144 144
n
ii
xx
n= + + + +
= = = =∑
The average length of the 144 fish is 42.81 cm.
The median is the average of the middle two observations once they have been ordered.
The 72nd and 73rd observations are 45 and 45. The average of these two observations is 45. Half of the fish lengths are less than 45 cm and half are longer.
The mode is 46 cm. This observation occurred 12 times.
b. The sample mean weight is:
1 732 795 547 ... 1433 151159 1049.72144 144
n
ii
xx
n= + + + +
= = = =∑
The average weight of the 144 fish is 1049.72 grams.
The median is the average of the middle two observations once they have been ordered.
The 72nd and 73rd observations are 989 and 1011. The average of these two observations is
989 1,011median 10002+
= =
Half of the fish weights are less than 1000 grams and half are heavier.
There are 2 modes, 886 and 1186. Each of these observations occurred 3 times.
c. The sample mean DDT level is:
1 10 16 23 ... 1.9 3507.1 24.35144 144
n
ii
xx
n= + + + +
= = = =∑
The average DDT level of the 144 fish is 24.35 parts per million.
18 Chapter 2
The median is the average of the middle two observations once they have been ordered. The 72nd and 73rd observations are 7.1 and 7.2. The average of these two observations is
7.1 7.2median 7.152+
= =
Half of the fish DDT levels are less than 7.15 parts per million and half are greater.
The mode is 12. This observation occurred 8 times. d. From the graph in Exercise 2.24a, the data are skewed to the left. This corresponds to the
relationship between the mean and the median. For data skewed to the left, the mean is less than the median. For the fish lengths, the mean is 42.81 and the median is 45.
e. From the graph in Exercise 2.24b, the data are slightly skewed to the right. This
corresponds to the relationship between the mean and the median. For data skewed to the right, the mean is more than the median. For the fish weights, the mean is 1049.72 and the median is 1000.
f. From the graph in Exercise 2.24c, the data are skewed to the right. This corresponds to the
relationship between the mean and the median. For data skewed to the right, the mean is more than the median. For the fish DDT levels, the mean is 24.35 and the median is 7.15.
2.52 a. Due to the "elite" superstars, the salary distribution is skewed to the right. Since this
implies that the median is less than the mean, the players' association would want to use the median.
b. The owners, by the logic of part a, would want to use the mean. 2.54 a. The sample mean is:
42080
203...4351 ==
++++==
∑=
n
xx
n
ii
The sample median is found by finding the average of the 10th and 11th observations once
the data are arranged in order. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 13 The 10th and 11th observations are 3 and 4. The average of these two numbers (median) is:
3 4 7median 3.52 2+
= = =
The mode is the observation appearing the most. For this data set, the mode is 1, which
appears 5 times.
Methods for Describing Sets of Data 19
b. Eliminating the largest number which is 13 results in the following:
The sample mean is:
53.31967
193...4351 ==
++++==
∑=
n
xx
n
ii
The sample median is found by finding the middle observation once the data are arranged
in order. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 The 10th observation is 3. The median is 3 The mode is the observations appearing the most. For this data set, the mode is 1, which
appears 5 times. By dropping the largest number, the mean is reduced from 4 to 3.53. The median is
reduced from 3.5 to 3. There is no effect on the mode.
c. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 13 If we drop the lowest 2 and largest 2 observations we are left with: 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7
The sample 10% trimmed mean is:
1 1 1 1 ... 7 56 3.516 16
n
ii
xx
n= + + + +
= = = =∑
The advantage of the trimmed mean over the regular mean is that very large and very small
numbers that could greatly affect the mean have been eliminated.
20 Chapter 2
2.56 a. s2 =
( )2
2
1
xx
nn
−
−
∑∑ =
2208410
10 1
−
−= 4.8889 s = 4.8889 = 2.211
b. s2 =
( )2
2
1
xx
nn
−
−
∑∑=
2100380 `40
40 1
−
−= 3.3333 s = 3.3333 = 1.826
c. s2 =
( )2
2
1
xx
nn
−
−
∑∑=
2171820
20 1
−
−= .1868 s = .1868 = .432
2.58 a. Range = 42 − 37 = 5
s2 =
( )2
2
1
xx
nn
−
−
∑∑=
219979355
5 1
−
−= 3.7 s = 3.7 = 1.92
b. Range = 100 − 1 = 99
s2 =
( )2
2
1
xx
nn
−
−
∑∑=
230325,7959
9 1
−
−= 1,949.25 s = 1,949.25 = 44.15
c. Range = 100 − 2 = 98
s2 =
( )2
2
1
xx
nn
−
−
∑∑=
229520,0338
8 1
−
−= 1,307.84 s = 1,307.84 = 36.16
2.60 This is one possibility for the two data sets. Data Set 1: 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 Data Set 2: 1, 1, 1, 1, 1, 5, 5, 5, 5, 5
1x = 1 1 2 2 3 3 4 4 5 5 3010 10
x n
+ + + + + + + + += =∑ = 3
2x = 1 1 1 1 1 5 5 5 5 5 3010 10
x n
+ + + + + + + + += =∑ = 3
Therefore, the two data sets have the same mean. The variances for the two data sets are:
( )22
2
21
30110 20101 9
xx
nsn
− −= =
−
∑∑9
= = 2.2222
( )22
2
22
30110 20101 9
xx
nsn
− −= =
−
∑∑9
= = 4.4444
Methods for Describing Sets of Data 21
The dot diagrams for the two data sets are shown below.
2.62 a. Range = 3 − 0 = 3
( )22
2
2
7155
1 5
xx
nsn
− −= =
− −
∑∑1
= 1.3 s = 1.3 = 1.1402
b. After adding 3 to each of the data points, Range = 6 − 3 = 3
( )22
2
2
221025
1 5
xx
nsn
− −= =
−
∑∑1−
= 1.3 s = 1.3 = 1.1402
c. After subtracting 4 from each of the data points, Range = −1 − (−4) = 3
( )22
2
2
( 13)395
1 5
xx
nsn
−− −
= =− −
∑∑1
= 1.3 s = 1.3 = 1.1402
d. The range, variance, and standard deviation remain the same when any number is added to
or subtracted from each measurement in the data set. 2.64 a. The maximum age is 64. The minimum age is 39. The range is 64 – 39 = 25.
b. The variance is: 2
212
2 1
2494125,764-50 27.822
1 50-1
n
ini
i
xx
nsn
=
=
⎛ ⎞⎜ ⎟⎝ ⎠−
= = =−
∑∑
c. The standard deviation is:
2 27.822 5.275s s= = =
d. Since the standard deviation of the ages of the 50 most powerful women in Europe is 10 years and is greater than that in the U.S. (5.275 years), the age data for Europe is more variable.
22 Chapter 2
2.66 a. The maximum weight is 1.1 carats. The minimum weight is .18 carats. The range is 1.1 − .18 = .92 carats.
b. The variance is:
2
22
2
194.32146.19308 .0768
1 308 1
ii
ii
xx
nsn
⎛ ⎞⎜ ⎟⎝ ⎠− −
= =− −
∑∑
= square carats
c. The standard deviation is:
2 .0768 .2772s s= = = carats
d. The standard deviation. This gives us an idea about how spread out the data are in the same units as the original data.
2.68 a. A worker's overall time to complete the operation under study is determined by adding the
subtask-time averages. Worker A
The average for subtask 1 is: 2117
xx
n= =∑ = 30.14
The average for subtask 2 is: 217
xx
n= =∑ = 3
Worker A's overall time is 30.14 + 3 = 33.14. Worker B
The average for subtask 1 is: 2137
xx
n= =∑ = 30.43
The average for subtask 2 is: 297
xx
n= =∑ = 4.14
Worker B's overall time is 30.43 + 4.14 = 34.57. b. Worker A
s =
( )27
2 21164557 15.8095
1 7 1
xx
nn
− −= =
− −
∑∑ = 3.98
Worker B
s =
( )22
2 21364877 .9524
1 7 1
xx
nn
− −= =
− −
∑∑ = .98
c. The standard deviations represent the amount of variability in the time it takes the worker
to complete subtask 1.
Methods for Describing Sets of Data 23
d. Worker A
s =
( )22
2 21677 .6667
1 7 1
xx
nn
− −= =
− −
∑∑ = .82
Worker B
s =
( )22
2 291477 4.4762
1 7 1
xx
nn
− −= =
− −
∑∑= 2.12
e. I would choose workers similar to worker B to perform subtask 1. Worker B has a slightly
higher average time on subtask 1 (A: x = 30.14, B: x = 30.43). But, Worker B has a smaller variability in the time it takes to complete subtask 1 (part b). He or she is more consistent in the time needed to complete the task.
I would choose workers similar to Worker A to perform subtask 2. Worker A has a smaller
average time on subtask 2 (A: x = 3, B: x = 4.14). Worker A also has a smaller variability in the time needed to complete subtask 2 (part d).
2.70 Since no information is given about the data set, we can only use Chebyshev's Rule. a. Nothing can be said about the percentage of measurements which will fall between x − s and x + s. b. At least 3/4 or 75% of the measurements will fall between x − 2s and x + 2s. c. At least 8/9 or 89% of the measurements will fall between x − 3s and x + 3s.
2.72 a. x = 20625
xn
=∑ = 8.24
s2 =
( )22
2 206177825
1 25
xx
nn
− −=
− −
∑∑1
= 3.357 s = 2s = 1.83
b.
Interval
Number of Measurements in Interval
Percentage
x ± s, or (6.41, 10.07) 18 18/25 = .72 or 72%
x ± 2s, or (4.58, 11.90) 24 24/25 = .96 or 96%
x ± 3s, or (2.75, 13.73) 25 25/25 = 1 or 100% c. The percentages in part b are in agreement with Chebyshev's Rule and agree fairly well
with the percentages given by the Empirical Rule.
24 Chapter 2
d. Range = 12 − 5 = 7 s ≈ range/4 = 7/4 = 1.75 The range approximation provides a satisfactory estimate of s = 1.83 from part a. 2.74 From Chebyshev’s Theorem, we know that at least ¾ or 75% of all observations will fall within
2 standard deviations of the mean. From Exercise 2.47, x = .631. From Exercise 2.66, s = .2772. This interval is:
2 .631 2(.2772) .631 .5544 (.0766, 1.1854)x s± ⇒ ± ⇒ ± ⇒
2.76 a. From the information given, we have x = 375 and s = 25. From Chebyshev's Rule, we know that at least three-fourths of the measurements are within the interval:
x ± 2s, or (325, 425)
Thus, at most one-fourth of the measurements exceed 425. In other words, more than 425 vehicles used the intersection on at most 25% of the days.
b. According to the Empirical Rule, approximately 95% of the measurements are within the
interval:
x ± 2s, or (325, 425)
This leaves approximately 5% of the measurements to lie outside the interval. Because of the symmetry of a mound-shaped distribution, approximately 2.5% of these will lie below 325, and the remaining 2.5% will lie above 425. Thus, on approximately 2.5% of the days, more than 425 vehicles used the intersection.
2.78 a. Since the sample mean (18.2) is larger than the sample median (15), it indicates that the
distribution of years is skewed to the right. In addition, the maximum number of years is 50 and the minimum is 2. If the distribution were symmetric, the mean and median should be about halfway between these two numbers. Halfway between the maximum and minimum values is 26, which is much larger than either the mean or the median.
b. The standard deviation can be estimated by the range divided by either 4 or 6. For this
distribution, the range is: Range = Largest − smallest = 50 − 2 = 48. Dividing the range by 4, we get an estimate of the standard deviation to be 48/4 = 12. Dividing the range by 6, we get an estimate of the standard deviation to be 48/6 = 8. Thus, the standard deviation should be somewhere between 8 and 12. For this problem, the
standard deviation is s = 10.64. This value falls in the estimated range of 8 to 12.
Methods for Describing Sets of Data 25
c. First, we calculate the number of standard deviations from the mean the value of 40 years is. To do this, we first subtract the mean and then divide by the value of the standard deviation.
Number of standard deviations is 40 40 18.210.64
xs− −
= = 2.05 ≈ 2
Using Chebyshev's Rule, we know that at most 1/k2 or 1/22 = 1/4 of the data will be more
than 2 standard deviations from the mean. Thus, this would indicate that at most 25% of the Generation Xers responded with 40 years or more.
Next, we calculate the number of standard deviations from the mean the value of 8 years is.
Number of standard deviations is 8 8 1810.64
x .2s− −
= = −.96 ≈ -1
Using Chebyshev's Rule, we get no information about the data within 1 standard deviation
of the mean. However, we know the median (15) is more than 8. By definition, 50% of the data are larger than the median. Thus, at least 50% of the Generation Xers responded with 8 years or more. No additional information can be obtained with the information given.
2.80 a. Using MINITAB, the frequency histogram for the time in bankruptcy is:
10987654321
20
10
0
Time in Bankrupt
Freq
uenc
y
The Empirical Rule is not applicable because the data are not mound shaped.
26 Chapter 2
b. Using MINITAB, the descriptive measures are:
Descriptive Statistics: Time in Bankrupt
Variable N Mean Median TrMean StDev SE Mean Time in 49 2.549 1.700 2.333 1.828 0.261
Variable Minimum Maximum Q1 Q3 Time in 1.000 10.100 1.350 3.500
From Chebyshev’s Theorem, we know that at least 75% of the observations will fall within 2 standard deviations of the mean. This interval is:
2 2.549 2(1.828) 2.549 3.656 ( 1.107, 6.205)x s± ⇒ ± ⇒ ± ⇒ −
c. There are 47 of the 49 observations within this interval. The percentage would be (47/49)*100% = 95.9%. This agrees with Chebyshev’s Theorem (at least 75%0. It also agrees with the Empirical Rule (approximately 95%).
d. From the above interval we know that about 95% of all firms filing for prepackaged
bankruptcy will be in bankruptcy between 0 and 6.2 months. Thus, we would estimate that a firm considering filing for bankruptcy will be in bankruptcy up to 6.2 months.
2.82 a. Since it is given that the distribution is mound-shaped, we can use the Empirical Rule. We
know that 1.84% is 2 standard deviations below the mean. The Empirical Rule states that approximately 95% of the observations will fall within 2 standard deviations of the mean and, consequently, approximately 5% will lie outside that interval. Since a mound-shaped distribution is symmetric, then approximately 2.5% of the day's production of batches will fall below 1.84%.
b. If the data are actually mound-shaped, it would be extremely unusual (less than 2.5%) to
observe a batch with 1.80% zinc phosphide if the true mean is 2.0%. Thus, if we did observe 1.8%, we would conclude that the mean percent of zinc phosphide in today's production is probably less than 2.0%.
2.84 a. Since we do not have any idea of the shape of the distribution of SAT-Math score
changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. This interval would be:
3 19 3(65) 19 195 ( 176, 214)x s± ⇒ ± ⇒ ± ⇒ − Thus, for a randomly selected student, we could be pretty sure that this student’s score
would be any where from 176 points below his/her previous SAT-Math score to 214 points above his/her previous SAT-Math score.
b. Since we do not have any idea of the shape of the distribution of SAT-Verbal score changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the
observations will fall within 3 standard deviations of the mean. This interval would be:
3 7 3(49) 7 147 ( 140, 154)x s± ⇒ ± ⇒ ± ⇒ −
Methods for Describing Sets of Data 27
Thus, for a randomly selected student, we could be pretty sure that this student’s score
would be any where from 140 points below his/her previous SAT-Verbal score to 154 points above his/her previous SAT-Verbal score.
c. A change of 140 points on the SAT-Math would be a little less than 2 standard deviations
from the mean. A change of 140 points on the SAT-Verbal would be a little less than 3 standard deviations from the mean. Since the 140 point change for the SAT-Math is not as big a change as the 140 point on the SAT-Verbal, it would be most likely that the score was a SAT-Math score.
2.86 a. z = 40 305
x x s− −
= = 2 (sample) 2 standard deviations above the mean.
b. z = 90 892
x μσ− −
= = .5 (population) .5 standard deviations above the mean.
c. z = 50 505
x μσ− −
= = 0 (population) 0 standard deviations above the mean.
d. z = 20 304
x x s− −
= = −2.5 (sample) 2.5 standard deviations below the mean.
2.88 The 50th percentile of a data set is the observation that has half of the observations less than it.
Another name for the 50th percentile is the median. 2.90 Since the element 40 has a z-score of −2 and 90 has a z-score of 3,
−2 = 40 μσ− and 3 = 90 μ
σ−
⇒ −2σ = 40 − μ ⇒ 3σ = 90 − μ ⇒ μ − 2σ = 40 ⇒ μ + 3σ = 90 ⇒ μ = 40 + 2σ By substitution, 40 + 2σ + 3σ = 90 ⇒ 5σ = 50 ⇒ σ = 10 By substitution, μ = 40 + 2(10) = 60 Therefore, the population mean is 60 and the standard deviation is 10. 2.92 The percentile ranking of the age of 25 years would be 100% − 73.5% = 26.5%.
28 Chapter 2
2.94 a. From Exercise 2.77, x = 94.91 and s = 4.83. The z-score for an observation of 78 is:
78 94.91 3.504.83
x xzs− −
= = = −
This z-score indicates that an observation of 78 is 3.5 standard deviations below the mean. Very few observations will be lower than this one. b. The z-score for an observation of 98 is:
98 94.91 0.634.83
x xzs− −
= = =
This z-score indicates that an observation of 98 is .63 standard deviations above the mean. This score is not an unusual observation in the data set. 2.96 a. From the problem, μ = 2.7 and σ = .5
z = x - μσ
⇒ zσ = x − μ ⇒ x = μ + zσ
For z = 2.0, x = 2.7 + 2.0(.5) = 3.7 For z = −1.0, x = 2.7 − 1.0(.5) = 2.2 For z = .5, x = 2.7 + .5(.5) = 2.95 For z = −2.5, x = 2.7 − 2.5(.5) = 1.45 b. For z = −1.6, x = 2.7 − 1.6(.5) = 1.9 c. If we assume the distribution of GPAs is
approximately mound-shaped, we can use the Empirical Rule.
From the Empirical Rule, we know that ≈.025
or ≈2.5% of the students will have GPAs above 3.7 (with z = 2). Thus, the GPA corresponding to summa cum laude (top 2.5%) will be greater than 3.7 (z > 2).
We know that ≈.16 or 16% of the students will have GPAs above 3.2 (z = 1). Thus, the
limit on GPAs for cum laude (top 16%) will be greater than 3.2 (z > 1). We must assume the distribution is mound-shaped.
Methods for Describing Sets of Data 29
2.98 a. Since the data are approximately mound-shaped, we can use the Empirical Rule. On the blue exam, the mean is 53% and the standard deviation is 15%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is:
53 (15) (38, 68)x s± ⇒ ± ⇒
About 95% of all students will score within 2 standard deviations of the mean. This interval is:
2 53 2(15) 53 30 (23, 83)x s± ⇒ ± ⇒ ± ⇒
About 99.7% of all students will score within 3 standard deviations of the mean. This interval is:
3 53 3(15) 53 45 (8, 98)x s± ⇒ ± ⇒ ± ⇒
b. Since the data are approximately mound-shaped, we can use the Empirical Rule. On the red exam, the mean is 39% and the standard deviation is 12%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is:
39 (12) (27, 51)x s± ⇒ ± ⇒
About 95% of all students will score within 2 standard deviations of the mean. This interval is:
2 39 2(12) 39 24 (15, 63)x s± ⇒ ± ⇒ ± ⇒
About 99.7% of all students will score within 3 standard deviations of the mean. This interval is:
3 39 3(12) 39 36 (3, 75)x s± ⇒ ± ⇒ ± ⇒
c. The student would have been more likely to have taken the red exam. For the blue exam, we know that approximately 95% of all scores will be from 23% to 83%. The observed 20% score does not fall in this range. For the blue exam, we know that approximately 95% of all scores will be from 15% to 63%. The observed 20% score does fall in this range. Thus, it is more likely that the student would have taken the red exam.
2.100 The 25th percentile, or lower quartile, is the measurement that has 25% of the measurements
below it and 75% of the measurements above it. The 50th percentile, or median, is the measurement that has 50% of the measurements below it and 50% of the measurements above it. The 75th percentile, or upper quartile, is the measurement that has 75% of the measurements below it and 25% of the measurements above it.
30 Chapter 2
2.102 a. Median is approximately 4. b. QL is approximately 3 (Lower Quartile) QU is approximately 6 (Upper Quartile) c. IQR = QU − QL ≈ 6 − 3 = 3
d. The data set is skewed to the right since the right whisker is longer than the left, there is one outlier, and there are two potential outliers.
e. 50% of the measurements are to the right of the median and 75% are to the left of the upper
quartile. f. There are two potential outliers, 12 and 13. There is one outlier, 16. 2.104 a. From the problem, x = 52.33 and s = 9.22. The highest salary is 75 (thousand).
The z-score is z = x xs− = 75 52.33
9.22− = 2.46
Therefore, the highest salary is 2.46 standard deviations above the mean. The lowest salary is 35.0 (thousand).
The z-score is z = x xs− = 35.0 52.33
9.22− = −1.88
Therefore, the lowest salary is 1.88 standard deviations below the mean. The mean salary offer is 52.33 (thousand).
The z-score is z = x xs− = 52.33 52.33
9.22− = 0
The z-score for the mean salary offer is 0 standard deviations from the mean. No, the highest salary offer is not unusually high. For any distribution, at least 8/9 of the
salaries should have z-scores between −3 and 3. A z-score of 2.46 would not be that unusual.
Methods for Describing Sets of Data 31
b. Using MINITAB, the box plot is:
Since no salaries are outside the inner fences, none of them are potentially faulty observations. 2.106 Using MINITAB, the side-by-side box plots are:
GROUP
AG
E
321
65
60
55
50
45
40
From the boxplots, there appears to be one outlier in the third group. 2.108 a. First, we will compute the mean and standard deviation.
The sample mean is:
1 393 5.2475
n
ii
xx
n== = =∑
The sample variance is:
2
22
2
393594375 52.482
1 75 1
ii
ii
xx
nsn
⎛ ⎞⎜ ⎟⎝ ⎠− −
= = =− −
∑∑
32 Chapter 2
The standard deviation is: 2 52.482 7.244s s= = =
Since this data set is highly skewed, we will use 2 standard deviations from the mean as the cutoff for outliers. Z-scores with values greater than 2 in absolute value are considered outliers. An observation with a z-score of 2 would have the value:
5.242 2(7.244) 5.24 14.488 5.24 19.7287.244
x x xz x x xs− −
= ⇒ = ⇒ = − ⇒ = − ⇒ =
An observation with a z-score of -2 would have the value:
5.242 2(7.244) 5.24
7.24414.488 5.24 9.248
x x xz xs
x x
− −= ⇒ − = ⇒ − = −
⇒ − = − ⇒ = −
Thus any observation that is greater than to 19.728 or less than -9.248 would be considered an outlier. In this data set there would be 4 outliers: 21, 21, 25, 48.
b. Deleting these 4 outliers, we will recalculate the mean, median, variance, and standard
deviation. The median for the original data set is the middle number once they have been arranged in order and is the 38th observation which is 3.
The new mean is:
1 278 3.9271
n
ii
xx
n== = =∑
The new sample variance is:
2
22
2
278213271 14.907
1 71 1
ii
ii
xx
nsn
⎛ ⎞⎜ ⎟⎝ ⎠− −
= = =− −
∑∑
The new standard deviation is:
2 14.907 3.861s s= = =
The new median is the 36th observation once the data have been arranged in order and is 3. In the original data set, the mean is 5.24, the standard deviation is 7.244, and the median is 3. In the revised data set, the mean is 3.92, the standard deviation is 3.861, and the median is 3. The mean has been decreased, the standard deviation has been almost halved, but the median stays the same.
Methods for Describing Sets of Data 33
2.110 For Perturbed Intrinsics, but no Perturbed Projections:
1 1.0 1.3 3.0 1.5 1.3 8.1 1.625 5
n
ii
xx
n= + + + +
= = = =∑
2
212
2 1
8.115.63 2.5085 .6271 4 4
n
ini
ii
xx
nsn
=
=
⎛ ⎞⎜ ⎟⎝ ⎠− −
= = =−
∑∑
=
2 .627 .792s s= = = The z-score corresponding to a value of 4.5 is
4.5 1.62 3.63.792
x xzs− −
= = =
Since this z-score is greater than 3, we would consider this an outlier for perturbed intrinsics, but no perturbed projections. For Perturbed Projections, but no Perturbed Intrinsics:
1 22.9 21.0 34.4 29.8 17.7 125.8 25.165 5
n
ii
xx
n= + + + +
= = = =∑
2
212
2 1
125.83350.1 184.9725 46.2431 4 4
n
ini
ii
xx
nsn
=
=
⎛ ⎞⎜ ⎟⎝ ⎠− −
= = = =−
∑∑
2 46.243 6.800s s= = = The z-score corresponding to a value of 4.5 is
4.5 25.16 3.0386.800
x xzs− −
= = = −
Since this z-score is less than -3, we would consider this an outlier for perturbed projections, but no perturbed intrinsics. Since the z-score corresponding to 4.5 for the perturbed projections, but no perturbed intrinsics is smaller than that for perturbed intrinsics, but no perturbed projections, it is more likely that the that the type of camera perturbation is perturbed projections, but no perturbed intrinsics.
34 Chapter 2
2.112 Using MINITAB, a scatterplot of the data is:
876543210-1
15
10
5
0
Var1
Var
2
2.114 Using MINITAB, the scatterplot of the data is:
1050
550
450
350
250
150
50
Offices
Law
yers
As the number of offices increases, the number of lawyers also tends to increase. 2.116 a. Using MINITAB, the scatterplot is:
40302010
20
15
10
5
10th
30th
It appears that as the completion time for the 10th trial increases, the completion time for the 30th trial decreases.
Methods for Describing Sets of Data 35
b. Using MINITAB, the scatterplot is:
40302010
20
15
10
5
10th
50th
It appears that as the completion time for the 10th trial increases, the completion time for the 50th trial increases.
c. Using MINITAB, the scatterplot is:
2015105
20
15
10
5
30th
50th
It appears that as the completion time for the 30th trial increases, the completion time for the 50th trial increases.
36 Chapter 2
2.118 Using MINITAB, the scatterplot of the data is:
T ime
Ma
ss
6050403020100
7
6
5
4
3
2
1
0
Scatterplot of Mass vs Time
There is evidence to indicate that the mass of the spill tends to diminish as time increases. As time is getting larger, the mass is decreasing.
2.120 The mean is sensitive to extreme values in a data set. Therefore, the median is preferred to the
mean when a data set is skewed in one direction or the other. 2.122 a. If we assume that the data are about mound-shaped, then any observation with a z-score greater than 3 in absolute value would be considered an outlier. From Exercise
1.121, the z-score corresponding to 50 is −1, the z-score corresponding to 70 is 1, and the z-score corresponding to 80 is 2. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers.
b. From Exercise 1.121, the z-score corresponding to 50 is −2, the z-score corresponding to
70 is 2, and the z-score corresponding to 80 is 4. Since the z-score corresponding to 80 is greater than 3, 80 would be considered an outlier.
c. From Exercise 1.121, the z-score corresponding to 50 is 1, the z-score corresponding to 70
is 3, and the z-score corresponding to 80 is 4. Since the z-scores corresponding to 70 and 80 are greater than or equal to 3, 70 and 80 would be considered outliers.
d. From Exercise 1.121, the z-score corresponding to 50 is .1, the z-score corresponding to 70
is .3, and the z-score corresponding to 80 is .4. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers.
Methods for Describing Sets of Data 37
38 Chapter 2
2.124 a. x∑ = 4 + 6 + 6 + 5 + 6 + 7 = 34
2x∑ = 42 + 62 + 62 + 52 + 62 + 72 = 198
346
xx
n= =∑ = 5.67
( )22
2
2
34198 5.333361 6 1 5
xx
nsn
− −= = =
− −
∑∑ = 1.0667
s = 1.067 = 1.03 b. x∑ = −1 + 4 + (−3) + 0 + (−3) + (−6) = −9
2x∑ = (−1)2 + 42 + (−3)2 + 02 + (−3)2 + (−6)2 = 71
96
xx
n−
= =∑ = -$1.5
( )22
2
2
( 9)71 57.561 6 1 5
xx
nsn
−− −
= = =− −
∑∑ = 11.5 dollars squared
s = 11.5 = $3.39
c. 3 4 2 1 15 5 5 5 16
x = + + + +∑ = 2.0625
2 2 2 2 2
2 3 4 2 1 15 5 5 5 16
x ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + + + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
∑ = 1.2039
2.06255
xx
n= =∑ = .4125%
( )22
2
2
2.06251.2039 .353151 5 1 4
xx
nsn
− −= = =
− −
∑∑ = .0883% squared
s = .0883 = .30% d. (a) Range = 7 − 4 = 3 (b) Range = $4 − ($-6) = $10
(c) Range = 4 1 64 5 59% % % % %5 16 80 80 80
− = − = = .7375%
2.126 σ ≈ range/4 = 20/4 = 5
2.128 Using MINITAB, a pie chart of the data is:
9.8%true
90.2%false
C ategoryfalsetrue
Pie Chart of defect
A response of ‘true’ means the software contained defective code. Thus, only 9.8% of the
modules contained defective software code. 2.130 The z-score would be:
408 603.7 1.06185.4
x xzs− −
= = = −
Since this value is not very big, this is not an unusual value to observe. 2.132 a. The variable of interest is opinion of book reviews. The values could be ‘would not
recommend’, ‘cautious or very little recommendation’, ‘little or no preference’, ‘favorable/recommended’, and ‘outstanding/significant contribution’. Since these responses are not numerical, the variable is quantitative.
b. Most of the books (63%) received a "favorable/recommended" review. About the same
percentage of books received the following reviews: "cautious or very little recommendation" (10%), "little or no preference" (9%), and "outstanding/significant contribution" (12%). Only 5% of the books received "would not recommend" reviews.
c. If the top two categories are added together, the percent recommended is 75% (actually
slightly higher than 75%). This agrees with the study. 2.134 a. To display the status, we use a pie chart. From the pie chart,
we see that 58% of the Beanie babies are retired and 42% are current.
Methods for Describing Sets of Data 39
b. Using Minitab, a histogram of the values is:
Most (40 of 50) Beanie babies have values less than $100. Of the remaining 10, 5 have
values between $100 and $300, 1 has a value between $300 and $500, 1 has a value between $500 and $700, 2 have values between $700 and $900, and 1 has a value between $1900 and $2100.
c. A plot of the value versus the age of the Beanie Baby is as follows:
From the plot, it appears that as the age increases, the value tends to increase. 2.136 a. Using MINITAB, the stem-and-leaf display is: Stem-and-leaf of C1 Leaf Unit = 0.10 N = 46 4 0 34 4 4 (25) 0 5 5 5 5 5 5 5 556666 6 6 6 7 7 7 7 7 8 8 8 8 9 16 1 000011222 3 34 4 1 7 7 2 2 2 2 2 3 2 3 9 1 4 1 4 7
40 Chapter 2
b. The leaves that represent those brands that carry the American Dental Association seal are circled above.
c. It appears that the cost of the brands approved by the ADA tend to have the lower costs.
Thirteen of the twenty brands approved by the ADA, or (13/20) × 100% = 65% are less than the median cost.
2.138 a. Using MINITAB, the summary statistics are:
Descriptive Statistics: Marketing, Engineering, Accounting, Total
Variable N Mean Median TrMean StDev SE Mean Marketin 50 4.766 5.400 4.732 2.584 0.365 Engineer 50 5.044 4.500 4.798 3.835 0.542 Accounti 50 3.652 0.800 2.548 6.256 0.885 Total 50 13.462 13.750 13.043 6.820 0.965
Variable Minimum Maximum Q1 Q3 Marketin 0.100 11.000 2.825 6.250 Engineer 0.400 14.400 1.775 7.225 Accounti 0.100 30.000 0.200 3.725 Total 1.800 36.200 8.075 16.600
b. The z-scores corresponding to the maximum time guidelines developed for each
department and the total are as follows:
Marketing: z = 6.5 4.772.58
x xs− −
= = .67
Engineering: z = 7.0 5.043.84
x xs− −
= = .51
Accounting: z = 8.5 3.656.26
x xs− −
= = .77
Total: z = 17 13.466.82
x xs− −
= = .52
c. To find the maximum processing time corresponding to a z-score of 3, we substitute in the
values of z, , and s into the z formula and solve for x.
z = x x x x zs x x zss−
⇒ − = ⇒ = +
Marketing: x = 4.77 + 3(2.58) = 4.77 + 7.74 = 12.51 None of the orders exceed this time. Engineering: x = 5.04 + 3(3.84) = 5.04 + 11.52 = 16.56 None of the orders exceed this time. These both agree with both the Empirical Rule and Chebyshev's Rule.
Methods for Describing Sets of Data 41
Accounting: x = 3.65 + 3(6.26) = 3.65 + 18.78 = 22.43 One of the orders exceeds this time or 1/50 = .02. Total: x = 13.46 + 3(6.82) = 13.46 + 20.46 = 33.92 One of the orders exceeds this time or 1/50 = .02. These both agree with Chebyshev's Rule but not the Empirical Rule. Both of these last two
distributions are skewed to the right. d. Marketing: x = 4.77 + 2(2.58) = 4.77 + 5.16 = 9.93 Two of the orders exceed this time or 2/50 = .04. Engineering: x = 5.04 + 2(3.84) = 5.04 + 7.68 = 12.72 Two of the orders exceed this time or 2/50 = .04. Accounting: x = 3.65 + 2(6.26) = 3.65 + 12.52 = 16.17 Three of the orders exceed this time or 3/50 = .06. Total: x = 13.46 + 2(6.82) = 13.46 + 13.64 = 27.10 Two of the orders exceed this time or 2/50 = .04. All of these agree with Chebyshev's Rule but not the Empirical Rule. e. No observations exceed the guideline of 3 standard deviations for both Marketing and
Engineering. One observation exceeds the guideline of 3 standard deviations for both Accounting (#23, time = 30.0 days) and Total (#23, time = 36.2 days). Therefore, only (1/10) × 100% of the "lost" quotes have times exceeding at least one of the 3 standard deviation guidelines.
Two observations exceed the guideline of 2 standard deviations for both Marketing (#31,
time = 11.0 days and #48, time = 10.0 days) and Engineering (#4, time = 13.0 days and #49, time = 14.4 days). Three observations exceed the guideline of 2 standard deviations for Accounting (#20, time = 22.0 days; #23, time = 30.0 days; and #36, time = 18.2 days). Two observations exceed the guideline of 2 standard deviations for Total (#20, time = 30.2 days and #23, time = 36.2 days). Therefore, (7/10) × 100% = 70% of the "lost" quotes have times exceeding at least one the 2 standard deviation guidelines.
We would recommend the 2 standard deviation guideline since it covers 70% of the lost
quotes, while having very few other quotes exceed the guidelines. 2.140 a. First, construct a relative frequency distribution for the departments.
Class Department Frequency Relative Frequency 1 Production 13 .241 2 Maintenance 31 .574 3 Sales 3 .056 4 R & D 2 .037 5 Administration 5 .093 TOTAL 54 1.001
42 Chapter 2
The Pareto diagram is: From the diagram, it is evident that the departments with the worst safety record are Maintenance and Production. b. First, construct a relative frequency
distribution for the type of injury in the maintenance department.
Class Injury Frequency Relative Frequency 1 Burn 6 .194 2 Back strain 5 .161 3 Eye damage 2 .065 4 Cuts 10 .323 5 Broken arm 2 .065 6 Broken leg 1 .032 7 Concussion 3 .097 8 Hearing loss 2 .065 TOTAL 31 1.002
The Pareto diagram is: From the Pareto diagram, it is
evident that cuts is the most prevalent type of injury. Burns and back strain are the next most prevalent types of injuries.
2.142 a. Using MINITAB, the descriptive statistics are:
Descriptive Statistics: MPG
Variable N Mean Median TrMean StDev SE Mean MPG 36 40.056 40.000 40.063 2.177 0.363
Variable Minimum Maximum Q1 Q3 MPG 35.000 45.000 39.000 41.000
Methods for Describing Sets of Data 43
The mean is 40.056 and the standard deviation is 2.177. Both of these measures are measured in the same units as the original data, which are miles per gallon.
b. Since the sample mean is a good estimate of the population mean, the manufacturer should
be satisfied. The sample mean is 40.056 which is greater than 40. c. The range of the data set is 45 − 35 = 10. Using Chebyshev's Rule, the range should cover
approximately 6 standard deviations. Thus, a good estimate of the standard deviation would be 10/6 = 1.67. Using the Empirical Rule, the range should cover approximately 4 standard deviations. Thus, a good estimate of the standard deviation would be 10/4 = 2.5 The given standard deviation is 2.177 which is between these two estimates. Thus, it is a reasonable value.
d. Using MINITAB, the frequency histogram is (the relative frequency histogram would have
the same shape):
4544434241403938373635
9
8
7
6
5
4
3
2
1
0
MPG
Freq
uenc
y
Yes, the data appear to be mound-shaped. e. Because the data are mound-shaped, we can use the Empirical Rule. We would expect
approximately 68% of the data within the interval x ± s, approximately 95% of the data within the interval x ± 2s, and approximately all of the data within the interval x ± 3s.
f. The interval x ± s is 40.056 ± 2.177 or (37.879, 42.233). Twenty-seven of the
observations fall in this interval or 27/36 = .75 or 75%. This number is a little larger than 68%.
The interval x ± 2s is 40.056 ± 2(2.177) or (35.702, 44.410). Thirty-four of the
observations fall in this interval or 34/36 = .94 or 94%. This number is very close to 95%. The interval x ± 3s is 40.056 ± 3(2.177) or (33.525, 46.587). Thirty-six of the
observations fall in this interval or 36/36 = 1.00 or 100%. This number is the same as all of the observations.
44 Chapter 2
2.144 a. Both the height and width of the bars (peanuts) change. Thus, some readers may tend to equate the area of the peanuts with the frequency for each year.
b. The frequency bar chart is:
Methods for Describing Sets of Data 45
The Kentucky Milk Case (To accompany Chapters 1–2)
There are many things that could be included in a report about the possibility of collusion. I have concentrated on the incumbency rates, bid levels and dispersion, and average winning bids. With the data available, no comparison of market share can be made since there was so much missing data. Actually, with the data available, the exact analysis cannot be made, since only the winning bid information is provided. Thus, we have no idea what the losing bids were. I will present what I think is a reasonable solution. This is by no means the only solution to the case. Many other presentations could also be used. Incumbency Rates The incumbency rate is the percent of the school districts that are won by the same vendor who won the previous year. A table containing the incumbency rates is included as well as a plot. Notice in the plot that the incumbency rates in the Tri-county market is higher than that in the Surrounding market. From 1985 through 1988, the incumbency rate for the Tri-county market was never lower than .923, while in the same period in the Surrounding market, the incumbency rate was never higher than .730. This implies the possibility of collusion in the Tri-county market. Surrounding Market Tri-county Market Year Number of
Districts Same
Vendors Incumbency
Rate Number of
Districts Same
VendorsIncumbency
Rate 1984 26 16 .615 10 8 .800 1985 27 19 .704 12 12 1.000 1986 32 19 .594 13 13 1.000 1987 37 27 .730 13 12 .923 1988 37 25 .676 13 13 1.000 1989 37 23 .622 13 9 .692 1990 34 24 .706 13 10 .769 1991 5 3 .600 13 11 .846
46 The Kentucky Milk Case
The plot of the incumbency rates is: Bid Levels and Dispersion Since we only have access to the winning bids in each of the school districts, we cannot make a true analysis of the bid levels and dispersions. As a compromise, I have used the winning bids of the two dairies in question—Trauth and Meyer. I have looked at only the winning bids of these two dairies in both the Tri-county market and in the Surrounding market. If there was no collusion, then the winning bids and the dispersions of the winning bids should be similar in the two markets for the two dairies. I looked at the box plots of the winning bids of the two dairies in each market for each type of milk: whole white, lowfat white and lowfat chocolate. I have included only a few of the box plots as illustrations. Those included are for 1985 and 1986.
The Kentucky Milk Case 47
1985 Winning Bids: WHOLE LOWFAT LOWFAT OBS MARKET WINNER WHITE WHITE CHOCOLATE 1 SUR MEYER 0.1280 0.1250 0.1315 2 SUR TRAUTH 0.1200 0.1110 0.1090 3 SUR TRAUTH . 0.1079 0.1079 4 SUR TRAUTH . 0.1190 0.1210 5 SUR MEYER 0.1225 0.1130 0.1099 6 SUR TRAUTH 0.1230 0.1130 0.1120 7 SUR MEYER 0.1250 0.1145 0.1140 8 TRI TRAUTH 0.1440 0.1440 . 9 TRI TRAUTH 0.1450 0.1350 . 10 TRI MEYER 0.1410 0.1410 0.1410 11 TRI TRAUTH 0.1393 0.1393 . 12 TRI MEYER 0.1340 0.1340 0.1340 13 TRI MEYER 0.1445 0.1345 0.1395 14 TRI MEYER . 0.1345 . 15 TRI TRAUTH 0.1449 0.1349 0.1399 16 TRI TRAUTH . 0.1299 0.1299 17 TRI MEYER 0.1480 0.1480 0.1480 18 TRI TRAUTH 0.1310 0.1290 . 19 TRI MEYER . 0.1380 . 20 TRI TRAUTH 0.1435 0.1335 .
Box Plots for Whole White Milk—1985
MA RKET
WW
BID
TRI-C O UNTYSURRO UND
0.150
0.145
0.140
0.135
0.130
0.125
0.120
Boxplots for Whole White Milk - 1985
48 The Kentucky Milk Case
Box Plots for Lowfat White Milk—1985
MA RKET
LF
WB
ID
TRI-C O UNTYSURRO UND
0.15
0.14
0.13
0.12
0.11
Boxplots for Lowfat White Milk - 1985
Box Plots for Lowfat Chocolate Milk—1985
MA RKET
LF
CB
ID
TRI-C O UNTYSURRO UND
0.15
0.14
0.13
0.12
0.11
Boxplots for Lowfat Chocolate Milk - 1985
The Kentucky Milk Case 49
For each type of milk, the mean and median winning bids for the Tri-county market were higher than the corresponding winning bids in the Surrounding market. Also, the dispersion, indicated by the width of the boxes and the length of the whiskers, for the Surrounding market is larger than for the Tri-county market in most cases. This is indicative of collusion in the Tri-county market. This same pattern also existed in 1986.
1986 Winning Bids: WHOLE LOWFAT LOWFAT OBS MARKET WINNER WHITE WHITE CHOCOLATE 1 SUR TRAUTH 0.1195 0.1100 0.1085 2 SUR TRAUTH 0.1330 0.1240 0.1290 3 SUR TRAUTH 0.1140 0.1070 0.1050 4 SUR MEYER 0.1350 0.1250 0.1315 5 SUR TRAUTH 0.1224 0.1124 0.1110 6 SUR TRAUTH . 0.1110 0.1110 7 SUR TRAUTH . 0.1180 0.1200 8 SUR TRAUTH 0.1250 0.1125 0.1115 9 TRI TRAUTH 0.1475 0.1475 . 10 TRI TRAUTH 0.1469 0.1369 . 11 TRI MEYER 0.1440 0.1340 0.1395 12 TRI TRAUTH 0.1420 0.1420 . 13 TRI MEYER 0.1390 0.1390 0.1390 14 TRI MEYER 0.1470 0.1370 0.1420 15 TRI MEYER . 0.1380 . 16 TRI TRAUTH 0.1474 0.1374 0.1424 17 TRI TRAUTH . 0.1349 0.1349 18 TRI MEYER 0.1505 0.1505 0.1505 19 TRI TRAUTH 0.1360 0.1320 . 20 TRI MEYER . 0.1430 . 21 TRI TRAUTH 0.1460 0.1360 . Box Plots for Whole White Milk—1986
MA RKET
WW
BID
TRI-C O UNTYSURRO UND
0.15
0.14
0.13
0.12
0.11
Boxplots for Whole White Milk - 1986
50 The Kentucky Milk Case
Box Plots for Lowfat White Milk—1986
MA RKET
LF
WB
ID
TRI-C O UNTYSURRO UND
0.15
0.14
0.13
0.12
0.11
Boxplots for Lowfat White Milk - 1986
Box Plots for Lowfat Chocolate Milk—1986
MA RKET
LF
CB
ID
TRI-C O UNTYSURRO UND
0.15
0.14
0.13
0.12
0.11
0.10
Boxplots for Lowfat Chocolate Milk - 1986
The Kentucky Milk Case 51
The same pattern that existed for 1985 and 1986 also existed in 1984, 1987, and 1988. From 1989 on, the pattern no longer existed. Thus, from the plots, it appears that the two dairies were working together from 1984 through 1988 in the Tri-county market. I also plotted the mean winning bids for the two dairies in each of the two markets from 1984 through 1991 for each type of milk. In all three plots, the mean winning bid in 1983 was almost the same in the two markets. Then, in 1984, the mean winning bid in the Tri-county market was higher than in the Surrounding market for all three types of milk. This trend holds basically through 1988 (the lowfat white milk mean winning bid for the Surrounding market was greater than the mean winning bid in the Tri-county market in 1988). After 1988, the mean winning bids in the two markets are almost the same. This points to collusion in the Tri-county market from 1984 through 1988.
52 The Kentucky Milk Case
The dispersion, measured using the standard deviation, of the winning bids for each of the three types of milk was basically smaller in the Tri-county market than in the Surrounding market for the years 1985 through 1988. Again, after 1988 this pattern no longer existed. Again, this points to collusion between the two dairies in the Tri-county market during the years 1984 through 1988.
The Kentucky Milk Case 53
54 The Kentucky Milk Case