Download - Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods

Chapter 3, Part 1 Descriptive Statistics II:

Numerical Methods

Measures of Location Measures of Variability

xx ss

Measures of Location

Mean Median Mode Percentiles

Example: Apartment RentsGiven below is a sample of monthly rent values ($) for one-bedroom apartments. The data is a sample of 70 apartments in a particular city. The data are presented in ascending order. 425425 430430 430430 435435 435435 435435 435435 435435 440440 440440440440 440440 440440 445445 445445 445445 445445 445445 450450 450450

450450 450450 450450 450450 450450 460460 460460 460460 465465 465465465465 470470 470470 472472 475475 475475 475475 480480 480480 480480

480480 485485 490490 490490 490490 500500 500500 500500 500500 510510515515 525525 525525 525525 535535 549549 550550 570570 570570

575575 575575 580580 590590 600600 600600 600600 600600 615615 615615510510

Mean

The mean of a data set is the average of all the data values.

If the data are from a sample, the mean is denoted by (x-bar)

If the data are from a population, the mean is denoted by (mu).

xx

ni

xN

i

x

Example: Apartment Rents Mean

xx

ni 34 356

70490 80

,. ,.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Example: Apartment Rents Trimmed Mean

With n = 70, a 5% trimmed mean removes

.05(70) = 3.5 = 4 values from each end of the set.

5% trimmed mean = 30 206

62487 19

,.,.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Median

The median of a data set is the value in the middle when the data items are arranged in ascending order.

If there is an odd number of items, the median is the value of the middle item.

If there is an even number of items, the median is the average of the values for the middle two items.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Example: Apartment Rents Median

Median = 50th percentile

i = (p/100)n = (50/100)70 = 35.5 Averaging the 35th and 36th

data values:

Median = (475 + 475)/2 = 475

Mode

The mode of a data set is the value that occurs with greatest frequency.

Example: Apartment Rents

Mode

450 occurred most frequently (7 times)

Mode = 450425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Percentiles The p th percentile of a data set is a value such that at least p

percent of the items take on this value or less and at least (100-p) percent of the items take on this value or more.

– Arrange the data in ascending order.

– Compute index i, the position of the p th percentile.

i = (p/100)n

– If i is not an integer, round up. The p th percentile is the value in the i th position.

– If i is an integer, the p th percentile is the average of the values in positions i and i+1.

Example: Apartment Rents 90th Percentile

i = (p/100)n = (90/100)70 = 63

Averaging the 63rd and 64th data values:

90th Percentile = (580 + 590)/2 = 585425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Quartiles

Quartiles are specific percentiles. First Quartile = 25th Percentile Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615


Third Quartile

Third quartile = 75th percentile

i = (p/100)n = (75/100)70 = 52.5 = 53

Third quartile = 525

Measures of Variability

Range Variance Standard Deviation Coefficient of Variation

Range

The range of a data set is the difference between the largest and smallest data values.

It is the simplest measure of dispersion. It is very sensitive to the smallest and largest data

values.

Example: Apartment Rents Range

Range = largest value - smallest value

Range = 615 - 425 = 190425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Variance

The variance is the average of the squared differences between each data value and the mean.

If the data set is a sample, the variance is denoted by s2.

If the data set is a population, the variance is denoted by 2.sxi x

n

2

1

( )i22

2

( )xN

2 )xi

Standard Deviation

The standard deviation of a data set is the positive square root of the variance.

It is measured in the same units as the data, making it more easily comparable to the mean.

If the data set is a sample, the standard deviation is denoted s.

If the data set is a population, the standard deviation is denoted (sigma).

s s 2

2

Coefficient of Variation

The coefficient of variation indicates how large the standard deviation is in relation to the mean.

If the data set is a sample, the coefficient of variation is computed as follows:

If the data set is a population, the coefficient of variation is computed as follows:

s

x( )100( )

( )100


Variance

Standard Deviation

Coefficient of Variation

sxi x

n2

2

12 996 16

( ), .

, .

s s 2 2996 47 54 74. . . .

s

x 100

54 74

490 80100 11 15

.

.. ..


Numerical Methods

Measures of Relative Location and Locating Outliers– z -Scores– Chebyshev’s Theorem– The Empirical Rule– Detecting Outliers

z -Scores

The z -score is often called the standardized value. It denotes the number of standard deviations a data value xi is from the mean.

A data value less than the sample mean will have a z-score less than zero. A data value greater than the sample mean will have a z -score greater than zero. A data value equal to the sample mean will have a z -score of zero.z

x xsi

i

z -Score of Smallest Value (425)

Standardized Values for Apartment Rents

zx x

s 425 490 80

54 741 20

..

.i ..

.

-1.20-1.20 -1.11-1.11 -1.11-1.11 -1.02-1.02 -1.02-1.02 -1.02-1.02 -1.02-1.02 -1.02-1.02 -0.93-0.93 -0.93-0.93-0.93-0.93 -0.93-0.93 -0.93-0.93 -0.84-0.84 -0.84-0.84 -0.84-0.84 -0.84-0.84 -0.84-0.84 -0.75-0.75 -0.75-0.75-0.75-0.75 -0.75-0.75 -0.75-0.75 -0.75-0.75 -0.75-0.75 -0.56-0.56 -0.56-0.56 -0.56-0.56 -0.47-0.47 -0.47-0.47-0.47-0.47 -0.38-0.38 -0.38-0.38 -0.34-0.34 -0.29-0.29 -0.29-0.29 -0.29-0.29 -0.20-0.20 -0.20-0.20 -0.20-0.20-0.20-0.20 -0.11-0.11 -0.01-0.01 -0.01-0.01 -0.01-0.01 0.170.17 0.170.17 0.170.17 0.170.17 0.350.350.350.35 0.440.44 0.620.62 0.620.62 0.620.62 0.810.81 1.061.06 1.081.08 1.451.45 1.451.451.541.54 1.541.54 1.631.63 1.811.81 1.991.99 1.991.99 1.991.99 1.991.99 2.272.27 2.272.27

-1.20-1.20 -1.02-1.02 -1.02-1.02 -0.93-0.93 -0.93-0.93


Chebyshev’S Theorem

At least (1 - 1/At least (1 - 1/k k 22) of the items in any data set will ) of the items in any data set will be within be within k k standard deviations of the mean, standard deviations of the mean, where where k k is any value greater than 1.is any value greater than 1.– At least 75% of the items must be within At least 75% of the items must be within k k = 2 = 2

standard deviations of the mean.standard deviations of the mean.– At least 89% of the items must be within At least 89% of the items must be within kk = 3 = 3

standard deviations of the mean.standard deviations of the mean.– At least 94% of the items must be within At least 94% of the items must be within kk = 4 = 4

standard deviations of the mean.standard deviations of the mean.


Chebyshev’s Theorem

Let k = 1.5 with = 490.80 and s = 54.74

At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%

of the rent values must be between

- k(s) = 490.80 - 1.5(54.74) = 409

and

+ k(s) = 490.80 + 1.5(54.74) = 573

x

x

x

Chebyshev’s Theorem (continued)

Actually, 86% of the rent values

are between 409 and 573.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615


Empirical Rule

For data having a bell-shaped distribution:

Approximately 68% of the data values will be within one standard deviation of the mean.

Approximately 95% of the data values will be within two standard deviations of the mean.

Almost all of the items (99%) will be within three standard deviations of the mean.

Example: Apartment Rents Empirical Rule

Interval % in Interval

Within +/- 1s 436.06 to 545.54 48/70 = 69%

Within +/- 2s 381.32 to 600.28 68/70 = 97%

Within +/- 3s 326.58 to 655.02 70/70 = 100%

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Detecting Outliers

An outlier is an unusually small or unusually large value in a data set.

A data value with a z -score less than -3 or greater than +3 might be considered an outlier.

It might be an incorrectly recorded data value. It might be a data value that was incorrectly

included in the data set. It might be a correctly recorded data value that

belongs in the data set!

Example: Apartment Rents Detecting Outliers

The most extreme z -scores are -1.20 and 2.27.

Using |z | > 3 as the criterion for an outlier, there are no outliers in this data set.


Numerical Methods Measures of Association Between Two Variables Working with Grouped Data

Measures of Association Between Two Variables

Covariance Correlation Coefficient

Covariance Positive values indicate a positive relationship. Negative values indicate a negative relationship. If the data sets are samples, the covariance is

denoted by sxy.

If the data sets are populations, the covariance is denoted by .

sx x y y

nxyi i

( )( )

1

xyi x i yx y

N

( )( )

xy

Correlation Coefficient The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship. If the data sets are samples, the coefficient is denoted by rxy.

If the data sets are populations, the coefficient is denoted by .

rs

s sxyxy

x y

xyxy

x yxy

Where Sx and Sy are the standarddeviations for each variable!

Mean for Grouped Data Sample Data

Population Data

where fi = frequency of class i

Mi = midpoint of class i

xf Mni i

f M

Ni i

Example: Apartment Rents Given below is the previous sample of monthly

rents for one-bedroom apartments presented as grouped data in the form of a frequency distribution.

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Example: Apartment Rent Mean for Grouped Data

This approximation differs by $2.41

from

the actual sample mean of $490.80.

Rent ($) fi Mi fiMi420-439 8 429.5 3436.0440-459 17 449.5 7641.5460-479 12 469.5 5634.0480-499 8 489.5 3916.0500-519 7 509.5 3566.5520-539 4 529.5 2118.0540-559 2 549.5 1099.0560-579 4 569.5 2278.0580-599 2 589.5 1179.0600-619 6 609.5 3657.0

Total 34525.0

Rent ($) fi Mi fiMi420-439 8 429.5 3436.0440-459 17 449.5 7641.5460-479 12 469.5 5634.0480-499 8 489.5 3916.0500-519 7 509.5 3566.5520-539 4 529.5 2118.0540-559 2 549.5 1099.0560-579 4 569.5 2278.0580-599 2 589.5 1179.0600-619 6 609.5 3657.0

Total 34525.0

x 34 525

70493 21

,.

Variance for Grouped Data

Sample Data

Population Data

sf M x

ni i2

2

1

( )

22

f MN

i i( )


Sample Variance for Grouped Data

Sample Standard Deviation for Grouped Data

This approximation differs by only $.20 from the

actual standard deviation of $54.74.

s2 3 017 89 , .

s 3 017 89 54 94, . .