+ All Categories
Home > Documents > Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1...

Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1...

Date post: 28-May-2018
Category:
Upload: voque
View: 241 times
Download: 0 times
Share this document with a friend
16
Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values of most numerical variables show a tendency to group around a specific value, statisticians use a set of methods, collectively known as measures of central tendency, to help identify the properties of such variables. Three commonly used measures are the arithmetic mean, also known simply as the mean or average, the median, and the mode. The Mean A number equal to the sum of the data values divided by the number of data values that were summed. Such as, many common sports statistics such as baseball batting averages and basketball points per game mean SAT score for incoming freshmen at a college, mean age of the workers in a company, mean waiting times at a bank. = 3 5 15 5 5 4 3 2 1 4 5 20 5 10 4 3 2 1 WORKED-OUT PROBLEM 1 Although many people sometimes find themselves running late as they get ready to go to work, few measure the actual time it takes to get ready in the morning. Suppose you want to determine the typical time that elapses between your alarm clocks programmed wake-up time and the time you leave your home for work. You decide to measure actual times (in minutes) for ten consecutive working days and record the following times: Day 1 2 3 4 5 6 7 8 9 10 Time 39 29 43 52 39 44 40 31 44 35 To compute the mean time, first compute the sum of all the data values: 39 +29 + 43 + 52 + 39 + 44 + 40 + 31 + 44 + 35, which is 396. Then, take this sum of 396 and divide by 10, the number of data values. The result, 39.6 minutes, is the mean time to get ready. WORKED-OUT PROBLEM 2 Consider the same problem but imagine that on day 4 an exceptional occurrence such as oversleeping caused you to leave your home 50 minutes later than you had recorded for that day. That would make the time for day 4, 102 minutes; the sum of all times, 446 minutes; and the mean (446 divided by 10), 44.6 minutes. You can see how one extreme value has dramatically changed the mean. Instead of being a number at or near the middle of the ten get-ready times, the new mean of 44.6 minutes is greater than 9 of the 10 get-ready times. In this case, the mean fails as a measure of a typical value or “central tendency.” 3
Transcript
Page 1: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

1

DESCRIPTIVE STATISTICS

. 3.1 Measures of Central Tendency

Because the data values of most numerical variables show a tendency to group around a specific value, statisticians use a set of methods, collectively known as measures of central tendency, to help identify the properties of such variables. Three commonly used measures are the arithmetic mean, also known simply as the mean or average, the median, and the mode.

The Mean

A number equal to the sum of the data values divided by the number of data values that were summed. Such as, many common sports statistics such as baseball batting averages and basketball points per game mean SAT score for incoming freshmen at a college, mean age of the workers in a company, mean waiting times at a bank.

𝑥 = 𝑥𝑖𝑛𝑖

𝑛

35

15

5

54321

4

5

20

5

104321

WORKED-OUT PROBLEM 1 Although many people sometimes find themselves running late as they get ready to go to work, few measure the actual time it takes to get ready in the morning. Suppose you want to determine the typical time that elapses between your alarm clocks programmed wake-up time and the time you leave your home for work. You decide to measure actual times (in minutes) for ten consecutive working days and record the following times:

Day 1 2 3 4 5 6 7 8 9 10

Time 39 29 43 52 39 44 40 31 44 35

To compute the mean time, first compute the sum of all the data values: 39 +29 + 43 + 52 + 39 + 44 + 40 + 31 + 44 + 35, which is 396. Then, take this sum of 396 and divide by 10, the number of data values. The result, 39.6 minutes, is the mean time to get ready. WORKED-OUT PROBLEM 2 Consider the same problem but imagine that on day 4 an exceptional occurrence such as oversleeping caused you to leave your home 50 minutes later than you had recorded for that day. That would make the time for day 4, 102 minutes; the sum of all times, 446 minutes; and the mean (446 divided by 10), 44.6 minutes. You can see how one extreme value has dramatically changed the mean. Instead of being a number at or near the middle of the ten get-ready times, the new mean of 44.6 minutes is greater than 9 of the 10 get-ready times. In this case, the mean fails as a measure of a typical value or “central tendency.”

3

Page 2: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

2 The Median The middle value when a set of the data values have been ordered from lowest to highest value. When the number of data values is even, no natural middle value exists and you perform a special calculation to determine the median.

dataorderedtheinpositionn

positionMedian2

1

INTERPRETATION The median splits the set of ranked data values into two parts that have an equal number of values. Extreme values do not affect the median, making the median a good alternative to the mean when such values occur.

WORKED-OUT PROBLEM 3 You need to determine the median age of a group of employees whose individual ages are 47, 23, 34, 22, and 27. You calculate the median by first ranking the ages from lowest to highest: 22, 23, 27, 34, and 47. Because you have five values, the natural middle is the third ranked value, 27, making the median 27. This means that half the workers are 27 years old or younger and half the workers are 27 years old or older. WORKED-OUT PROBLEM 4 You need to determine the median for the original set of ten get-ready times from WORKED-OUT PROBLEM 1 that was used to explain the mean. Ordering these values from lowest to highest, you have

Time 29 31 35 39 39 40 43 44 44 52

Ordered Position 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th

Because an even number of data values exists (ten), you calculate the mean of the two values closest to the middle—that is, the fifth and sixth ranked values, 39 and 40. The mean of 39 and 40 is 39.5, making the median 39.5 minutes for the set of ten times to get ready. The Mode The value (or values) in a set of data values that appears most frequently.

Page 3: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

3 WORKED-OUT PROBLEM 5 stem Leaf For the given stem and leaf display, find a) mean b) median c) mode.

5 4 3 2 1

1 4 5 7 0 0 0 3 6 1 1 3 4 5 9 9 2 2 3 4 8

WORKED-OUT PROBLEM 6: For the given data set, find a) mean b) median c) mode.

3.2 Measures of Position Measures of position describe the relative position of a data value of a numerical variable to the other values of the variable. Statisticians often use measures of position to compare two sets of data values. Two commonly encountered measures of position are the quartile and the standard (Z) score. Quartiles The three values that split a set of ranked data values into four equal parts, or quartiles. The first quartile, Q1, is the value such that 25.0% of the ranked data values are smaller and 75.0% are larger. The second quartile, Q2, is another name for the median, which, as discussed previously, splits the ranked values into two equal parts. The third quartile, Q3, is the value such that 75.0% of the ranked values are smaller and 25.0% are larger. WORKED-OUT PROBLEM 7 You are asked to determine the first quartile for the ranked get-ready times.

Time 29 31 35 39 39 40 43 44 44 52

Ordered Position 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th

Solution You first add 1 to 10, the number of values, and divide by 4 to get 2.75 to identify the second and third ranked values, 31 and 35. You multiply 35, the larger value, by the decimal fraction 0.75 to get 26.25. You multiply 31, the smaller value, by the decimal fraction 0.25 (which is 1 – 0.75) to get 7.75, and add 26.25 and 7.75 to produce 34, the first quartile value, indicating that 25% of the get-ready times are 34 minutes or less and that the other 75% are 34 minutes or more.

Page 4: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

4 WORKED-OUT PROBLEM 8 You conduct a study that compares the cost for a restaurant meal in a major city to the cost of a similar meal in the suburbs outside the city. You collect meal cost per person data from a sample of 50 city restaurants and 50 suburban restaurants and arrange the 100 values in two ranked sets as follows:

City Cost Data

13 21 22 22 24 25 26 26 26 26

30 32 33 34 34 35 35 35 35 36

37 37 39 39 39 40 41 41 41 42

43 44 45 46 50 50 51 51 53 53

53 55 57 61 62 62 62 66 68 75

Suburban Cost Data

21 22 25 25 26 26 27 27 28 28 28 29 31 32 32 35 35 36 37 37

37 38 38 38 39 40 40 41 41 41 42 42 43 44 47 47 47 48 50 50 50 50 50 51 52 53 58 62 65 67

a) Create a worksheet in EXCEL and enter the city cost data into column A and the suburban cost data into

column B. b) By using the formulas for descriptive statistics, find mean, median, mode and quartiles.

Solution

3.3 Measures of Variation

Measures of variation show the amount of dispersion, or spread, in the data values of a numerical variable. Four frequently used measures of variation are the range, the variance, the standard deviation, and the Z score, all of which can be calculated as either sample statistics or population parameters. The Range The difference between the largest and smallest data values in a set of data values. Such as the daily high and low temperatures, the stock market 52-week high and low closing prices, the fastest and slowest times for timed sporting events.

Page 5: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

5 The Variance and the Standard Deviation Two measures that tell you how a set of data values fluctuate around the mean of the variable. The standard deviation is the positive square root of the variance.

1-n

)X(X

S

n

1i

2i

Xi is the each element in the set, X is the mean of the set, and n is the sample size.

To calculate the variance, you take the difference between each data value and the mean, square this difference and then sum the squared differences.You then take this sum of squares (or SS) and divide it by either 1 less than the number of data values, if you have sample data, or the number of data values, if you have population data and the positive square root of such a non-negative number is the standard deviation. WORKED-OUT PROBLEM 9 The age of the kids playing football is given as follows. Find the standard deviation.

10 12 14 15 17 18 18 24

Solution: n = 8 Mean = X = 16

4.30957

130

18

16)(2416)(1416)(1216)(10

1n

)X(24)X(14)X(12)X(10S

22222222

WORKED-OUT PROBLEM 10 You want to calculate the variance and standard deviation for the get-ready times first presented in example 1. As first steps, you calculate the difference between each of the 10 individual times and the mean (39.6 minutes), square those differences, and sum the squares. (Table shows these first steps.)

Difference: Day Time Time Minus Mean (39.6) Square of Difference

1 39 -0.6 0.36 2 29 -10.6 112.36 3 43 3.4 11.56 4 52 12.4 153.76 5 39 -0.6 0.36 6 44 4.4 19.36 7 40 0.4 0.16 8 31 -8.6 73.96 9 44 4.4 19.36

10 35 -4.6 21.16

Sum of Squares: 412.40

Because these data are a sample of get-ready times, the sum of squares, 412.40, is divided by one less than the number of data values, 9, to get 45.82, the sample variance. The square root of 45.82 (6.77, after rounding) is the sample standard deviation.

Page 6: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

6 WORKED-OUT PROBLEM 11 A group of friends went to restaurant and the bill for each one was as given below.

1 2 3 4 5

20 15 10 5 2

Calculate a) mean and b) standard deviation. WORKED-OUT PROBLEM 12 The age of the people in a big family is given as follows.

Grandfather Father Mother Son Daughter

75 45 40 22 18

Calculate a) mean and b) standard deviation. WORKED-OUT PROBLEM 13 Compare the distribution of the following samples. WORKED-OUT PROBLEM 14 By using EXCEL find the variance and standard deviation of the restaurant meal study.

Solution For city meal costs, the standard deviation is $13.89, and the majority of meals will cost between $27.57 and $55.35 (the mean $41.46 ± $13.89). For suburban meal costs, the standard deviation is $11.14, and the majority of those meals will cost between $28.82 and $51.10 (the mean $39.96 ± $11.14).

Suburban Cost Data

21 22 25 25 26 26 27 27 28 28 28 29 31 32 32 35 35 36 37 37

37 38 38 38 39 40 40 41 41 41 42 42 43 44 47 47 47 48 50 50 50 50 50 51 52 53 58 62 65 67

City Cost Data

13 21 22 22 24 25 26 26 26 26 30 32 33 34 34 35 35 35 35 36

37 37 39 39 39 40 41 41 41 42

43 44 45 46 50 50 51 51 53 53

53 55 57 61 62 62 62 66 68 75

Type the values in EXCEL,

From the FORMULAS, select STATISTICAL then

select VAR to calculate variance.

The formula will ask you the range of the cells, so

select the numbers that you want to calculate the

variance.

For the standard deviation, you can either use the

STD function or you can take the square root of the

variance by SQRT function.

Page 7: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

7

Standard (Z) Score The number that is the difference between a data value and the mean of the variable, divided by the standard deviation.Z scores help you determine whether a data value is an extreme value, or outlier—that is, far from the mean. A data value is considered an extreme outlier if its Z-score is less than -3.0 or greater than +3.0.

S

XXZ

where X represents the data value

𝑋 is the sample mean

S is the sample standard deviation

WORKED-OUT PROBLEM 15 Suppose the mean math SAT score is 490, with a standard deviation of 100. Compute the Z-score for a test score of 620. Solution:

3.1100

130

100

490620

S

XXZ

A score of 620 is 1.3 standard deviations above the mean and would not be considered an outlier. WORKED-OUT PROBLEM 16 For the get ready times, the mean is 39.6 and standard deviation is 6.4.

a) Calculate the each Z-score for the given times. b) From the calculated values is there any outlier?

Day Time Time Minus Mean Z-Score

1 39

2 29

3 43

4 52

5 39

6 44

7 40

8 31

9 44

10 35

3.4 Shape of Distributions Shape, a third important property of a set of numerical data, describes the pattern of the distribution of data values through the range of the data values.

The more the data are spread out, the greater the range, variance, and standard deviation.

The shape may be symmetric, left-skewed, or right-skewed.

Page 8: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

8 Symmetrical Shape A set of data values in which the mean equals the median value and each half of the curve is a mirror image of the other half of the curve. Left-Skewed Shape A set of data values in which the mean is less than the median value and the left tail of the distribution is longer than the right tail of the distribution. Also known as negative skew. Right-Skewed Shape A set of data values in which the mean is greater than the median value and the right tail of the distribution is longer than the left tail of the distribution. Also known as positive skew.

WORKED-OUT PROBLEM 14 The exam results from the statistics is given as follows. Find the mean, median for each distribution, and determine whether the graph is left-skewed, right-skewed or symmetrical.

WORKED-OUT PROBLEM 15 The Superfund Act was passed by Congress to encourative age state participation in the implementation of laws relating to the release and cleanup of hazardous substances. Hazardous waste sites financed by the Superfund Act are called Superfund sites. A total of 395 Superfund sites are operated by waste management companies in Arkansas (Tabor and Stanwick, Arkansas Business and Economic Review, Summer 1995). The number of these Superfund sites in each of Arkansas' 75 counties is shown in the table. Numerical descriptive measures for the data set are provided in the EXCEL printout.

SITES

Mean 5.24

Standard Error 0.836517879

Median 3

Mode 2

Standard Deviation 7.244457341

Sample Variance 52.48216216

Kurtosis 16 .176573

Skewness 3.468289878

Range 48

Minimum 0

Maximum 48

Sum 393

Count 75

Confidence Level(95.000%) 1.639542488

a. Locate the measures of central tendency on the printout and interpret their values. b. Note that the data set contains at least one county with an unusually large number of Superfund sites. Find the largest of these measurements, called an outlier.

100 50 60 70 80 90 40 30 100 50 60 70 80 90 40 30

Median = Mean =

Median= Mean =

100 50 60 70 80 90 40 30

Median = Mean =

Page 9: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

9 The Box-and-Whisker Plot For a set of data values, the five numbers that correspond to the smallest value, the first quartile Q1, the median, the third quartile Q3 , and the largest value concisely summarizes the shape of a set of data values.

A box-and-whisker plot shows a right-skewed shape if the distance from the line that represents the median to the line that represents the largest value is greater than the distance from the line that represents the smallest value to the line that represents the median. A box-and-whisker plot shows a left-skewed shape if the distance from the line that represents the smallest value to the line that represents the median is greater than the distance from the line that represents the median to the line that represents the largest value.

WORKED-OUT PROBLEM 16 The following figure represents a box-and whisker plot of the times to get ready in the morning: What is the skewness of the plot. Answer The box-and-whisker plot seems to indicate an approximately symmetric distribution of the time to get ready.

Smallest value

First Quartile

Median

Third Quartile

Largest value

25 30 35 40 45 50 55 52 28

Page 10: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

10 WORKED-OUT PROBLEM 17 You seek to better understand the shape of the restaurant meal cost study data used in an earlier worked-out problem. You create box-and-whisker plots for the meal cost of both the city and suburban groups.

Compare the mean, range and skewness of the costs. WORKED-OUT PROBLEM 18 Find the skewness of the given box-and-whisker plot.

WORKED-OUT PROBLEM 19 The time series of a stock exchange market is represented with a box and whisker plot. For the given segment of the graph, describe the skewness.

Page 11: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

11

Test Yourself Short Answers 1. Which of the following statistics are measures of central tendency?

(a) median (b) range (c) standard deviation (d) all of these (e) none of these

2. Which of the following statistics is not a measure of central tendency?

(a) mean (b) median (c) mode (d) range

3. Which of the following statements about the median is not true?

(a) It is less affected by extreme values than the mean. (b) It is a measure of central tendency. (c) It is equal to the range. (d) It is equal to the mode in bell-shaped “normal” distributions.

4. Which of the following statements about the mean is not true?

(a) It is more affected by extreme values than the median. (b) It is a measure of central tendency. (c) It is equal to the median in skewed distributions. (d) It is equal to the median in symmetric distributions.

5. Which of the following measures of variability is dependent on every value in a set of data?

(a) range (b) standard deviation (c) each of these (d) neither of these

6. Which of the following statistics cannot be determined from a box-and whisker plot?

(a) standard deviation (b) median (c) range (d) the first quartile

7. In a symmetric distribution:

(a) the median equals the mean (b) the mean is less than the median (c) the mean is greater than the median (d) the median is less than the mode

8. The shape of a distribution is given by the:

(a) mean (b) first quartile (c) skewness (d) variance

9. In a five-number summary, the following is not included:

(a) median (b) third quartile (c) mean (d) minimum (smallest) value

10.In a right-skewed distribution:

(a) the median equals the mean (b) the mean is less than the median (c) the mean is greater than the median (d) the median equals the mode

Answer True or False: 11. In a box-and-whisker plot, the box portion represents the data between the first and third quartile values. 12. The line drawn within the box of the box-and-whisker plot represents the mean. Fill in the blanks: 13. The _______ is found as the middle value in a set of values placed in order from lowest to highest for an odd-sized sample of numerical data. 14. The standard deviation is a measure of _______. 15. If all the values in a data set are the same, the standard deviation will be _______. 16. A distribution that is negative-skewed is also called ______-skewed. 17. If each half of a distribution is a mirror image of the other half of the distribution, the distribution is called _______. 18. The median is a measure of ______. 19, 20, 21. The three characteristics that describe a set of numerical data are _______, ________, and _______. For Questions 22 through 30, the number of days absent by a sample of nine students during a semester was as follows:

9 1 1 10 7 11 5 8 2 22. The mean is equal to ________. 23. The median is equal to _______. 24. The mode is equal to ________.

Page 12: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

12 25. The first quartile is equal to ________. 26. The third quartile is equal to ________. 27. The range is equal to _________. 28. The variance is approximately equal to ________. 29. The standard deviation is approximately equal to ________. 30. The data are: (a) right-skewed (b) left-skewed (c) symmetrical 31. In a left-skewed distribution:

(a) the median equals the mean (b) the mean is less than the median (c) the mean is greater than the median (d) the median equals the mode

32. Which of the statements about the standard deviation is true?

(a) It is a measure of variation around the mean. (b) It is the square of the variance. (c) It is a measure of variation around the median. (d) It is a measure of central tendency.

33. The smallest possible value of the standard deviation is _______. Problems 1. The price for two tickets (including online service charges), a large popcorn, and two medium soft drinks at a sample of six theatre chains is as follows:

$36.15 $31.00 $35.05 $40.25 $33.75 $43.00

(a) Compute the mean and median. (b) Compute the variance, standard deviation, and range. (c) Are the data skewed? If so, how? (d) Based on the results of (a) through (c), what conclusions can you reach concerning the cost of going to the movies?

2. Tuna sushi was purchased from 13 Manhattan restaurants and tested for mercury. For each restaurant, the number of pieces needed to reach the maximum acceptable level of mercury, as defined by the Environmental Protection Agency, was determined to be:

8.6 2.6 1.6 5.2 7.7 4.7 6.4 6.2 3.6 4.9 9.9 3.3 4.1

(a) Compute the mean and median. (b) Compute the first quartile and the third quartile. (c) Compute the variance, standard deviation, and range. (d) Construct a box-and-whisker plot. (e) Are the data skewed? If so, how? (f) Based on the results of (a) through (d), what conclusions can you reach concerning the number of pieces it would take to reach what the Environmental Protection Agency considers to be an acceptable level to be regularly consumed?

3. As player salaries have increased, the cost of attending NBA professional basketball games has increased dramatically. The following data represents the Fan Cost Index, which is the cost of four tickets, two beers, four soft drinks, four hot dogs, two game programs, two caps, and the parking fee for one car for each of the 29 teams.

244.48 358.72 196.90 335.00 317.90 339.23 271.16 282.00 206.52 270.94 250.57 317.00 453.95 228.28 339.20 231.38 328.90 182.30 394.52 229.82 269.48 302.04 251.86 318.30 303.79 229.50 320.47 235.75 194.56

(a) Compute the mean and median. (b) Compute the first quartile and the third quartile. (c) Compute the variance, standard deviation, and range. (d) Construct a box-and-whisker plot. (e) Are the data skewed? If so, how? (f) Based on the results of (a) through (d), what conclusions can you reach concerning the Fan Cost Index of NBA games?

Page 13: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

13 4. The following data represent the viscosity (friction, as in automobile oil) taken from 120 manufacturing batches (ordered from lowest viscosity to highest viscosity).

12.6 12.8 13.0 13.1 13.3 13.3 13.4 13.5 13.6 13.7

13.7 13.7 13.8 13.8 13.9 13.9 14.0 14.0 14.0 14.1 14.1 14.1 14.2 14.2 14.2 14.3 14.3 14.3 14.3 14.3

14.3 14.4 14.4 14.4 14.4 14.4 14.4 14.4 14.4 14.5

14.5 14.5 14.5 14.5 14.5 14.6 14.6 14.6 14.7 14.7

14.8 14.8 14.8 14.8 14.9 14.9 14.9 14.9 14.9 14.9 14.9 15.0 15.0 15.0 15.0 15.1 15.1 15.1 15.1 15.2

15.2 15.2 15.2 15.2 15.2 15.2 15.2 15.3 15.3 15.3

15.3 15.3 15.4 15.4 15.4 15.4 15.5 15.5 15.6 15.6

15.6 15.6 15.6 15.7 15.7 15.7 15.8 15.8 15.9 15.9 16.0 16.0 16.0 16.0 16.1 16.1 16.1 16.2 16.3 16.4

16.4 16.5 16.5 16.6 16.8 16.9 16.9 17.0 17.6 18.6

(a) Compute the mean and median. (b) Compute the first quartile and the third quartile. (c) Compute the variance, standard deviation, and range. (d) Construct a box-and-whisker plot. (e) Are the data skewed? If so, how? (f) Based on the results of (a) through (d), what conclusions can you reach concerning the viscosity?

Additional Questions 5. (2.26 /Mc Clave) A data set contains the observations 5, 1, 3, 2, 1. Find:

a. ∑x b. ∑x2 c. ∑(x - 1) d. ∑ (x - 1)

2 e. (∑x)

2

6. (2.6 /Mc Clave) Consider the following sample of n = 7 measurements:

5 7 4 5 20 6 2

a. Calculate the median of this sample. b. Eliminate the last measurement (the 2) and calculate the median of the remaining n = 6 measurements.

7. (2.30 /Mc Clave) Calculate the mode, mean, and median of the following data:

18 10 15 13 17 15 12 15 18 16 11

8. (2.31 /Mc Clave) Calculate the mean and median of the following grade point averages

3.2 2.5 2.1 3.7 2.8 2.0

9. (2.32 /Mc Clave) Calculate the mean, median, and mode for each of the following samples

a. 7, -2, 3, 3, 0, 4 b. 2, 3, 5, 3, 2, 3, 4, 3, 5, 1, 2, 3, 4 c. 51, 50, 47, 50, 48, 41, 59, 68, 45, 3

Page 14: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

14 Applying the Concepts

10. (2.35 /Mc Clave) The total number of passengers handled in 1998 by eight cruise ships based in Port Canaveral (Florida) are listed in the table below. Find and interpret the mean and median of the data set.

Cruise Line (Ship) Number of Passengers

Canaveral (Dolphin) 152,240

Carnival (Fantasy) 480,924

Disney (Magic) 73,504

Premier (Oceanic) 270,361

Royal Caribbean (Nordic Empress) 106,161

Sun Cruz Casinos 453,806

Sterling Cruises (New Yorker) 15,782

Topaz Int'l. Shipping (Topaz) 28,280

11. (2.9/Mc Clave) Calculate the variance and standard deviation of the following sample: 2, 3, 3, 3, 4.

12. (2.43/Mc Clave) Answer the following questions about variability of data sets: a. What is the primary disadvantage of using the range to compare the variability of data sets? b. Describe the sample variance using words rather than a formula. Do the same with the population variance. c. Can the variance of a data set ever be negative? Explain. d. Can the variance ever be smaller than the standard deviation? Explain.

12. (2.44/Mc Clave) Calculate the variance and standard deviation for samples where

a. n = 10, ∑x2 = 84, ∑x = 20

b. n = 40, ∑x2 = 380, ∑x = 100

c. n = 20, ∑x2 = 18, ∑x = 17

13. (2.45/Mc Clave) Calculate the range, variance, and standard deviation for the following samples: a. 4, 2, 1, 0, 1 b. 1, 6, 2, 2, 3, 0, 3 c. 8,-2, 1, 3, 5, 4, 4, 1, 3, 3 d. 0,2,0,0,-1,1,-2,1,0,-1,1,-1,0,-3,-2,-1,0,1

14. (2.46/Mc Clave) Calculate the range, variance, and standard deviation for the following samples: a. 39, 42, 40, 37, 41 b. 100, 4, 7, 96, 80, 3, 1, 10, 2 c. 100, 4, 7, 30, 80, 30, 42, 2

14. (2.49/Mc Clave) The table at the bottom of the page lists the 1999 base prices for automobiles manufactured by Buick and Cadillac.

a. Calculate the range of the Buick prices and the range of the Cadillac prices. b. The lowest and highest priced Chevrolet cars are $9,373 (Metro) and $45,575 (Corvette), respectively. Calculate Chevrolet's range.

c. Using only the three ranges you computed in parts a and b and no other information, is it possible to determine which manufacturer produces only luxury cars? Explain.

Buick Price

Century Custom $19,335

Century Limited 20,705

Regal LS 22,255

Regal GS 24,955

LeSabre Custom 23,340

LeSabre Limited 26,605

Park Avenue 3 1,800

Park Avenue Ultra 36,695

Riviera 34,490

Page 15: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

15 15. (6/Brase) The following data represent weights in kilograms of maize harvest from a random sample of 72

experimental plots.

7.8 9.1 9.5 10 10.2 10.5 11.1 11.5 11.7 11.8

12.2 12.2 12.5 13.1 13.5 13.7 13.7 14 14.4 14.5

14.6 15.2 15.5 16 16 16.1 16.5 17.2 17.8 18.2

19 19.1 19.3 19.8 20 20.2 20.3 20.5 20.9 21.1

21.4 21.8 22 22 22.4 22.5 22.5 22.8 22.8 23.1

23.1 23.2 23.7 23.8 23.8 23.8 23.8 24 24.1 24.1

24.5 24.5 24.9 25.1 25.2 25.5 26.1 26.4 26.5 26.7

27.1 29.5

a) Calculate min, max , range , mean, median and mode.

b) Compute the Q1, and Q3.

c) Make a box and whisker plot.

d) Describe the skewness of the distribution.

16. (9/Brase) Consumer Reports rated automobile insurance companies gave annual premiums for top rated

companies in several states.

a) Which state has the lowest premium? the highest?

b) Which state has the highest median premium?

c) Which state has the smallest range of premiums? smallest range of interquartile range?

d) The summaries given are belonging to box plots given, match them.

Page 16: Descriptive Statistics 1 3 DESCRIPTIVE STATISTICS - … ·  · 2014-12-09Descriptive Statistics 1 DESCRIPTIVE STATISTICS . 3.1 Measures of Central Tendency Because the data values

Descriptive Statistics

16 17. (10/Brase)

18. (2.108/McClave)

19. (2.108/McClave)


Recommended