Chapter 3, Part 1 Descriptive Statistics II:
Numerical Methods
Measures of Location Measures of Variability
xx ss
Measures of Location
Mean Median Mode Percentiles
Example: Apartment RentsGiven below is a sample of monthly rent values ($) for one-bedroom apartments. The data is a sample of 70 apartments in a particular city. The data are presented in ascending order. 425425 430430 430430 435435 435435 435435 435435 435435 440440 440440440440 440440 440440 445445 445445 445445 445445 445445 450450 450450
450450 450450 450450 450450 450450 460460 460460 460460 465465 465465465465 470470 470470 472472 475475 475475 475475 480480 480480 480480
480480 485485 490490 490490 490490 500500 500500 500500 500500 510510515515 525525 525525 525525 535535 549549 550550 570570 570570
575575 575575 580580 590590 600600 600600 600600 600600 615615 615615510510
Mean
The mean of a data set is the average of all the data values.
If the data are from a sample, the mean is denoted by (x-bar)
If the data are from a population, the mean is denoted by (mu).
xx
ni
xN
i
x
Example: Apartment Rents Mean
xx
ni 34 356
70490 80
,. ,.
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Example: Apartment Rents Trimmed Mean
With n = 70, a 5% trimmed mean removes
.05(70) = 3.5 = 4 values from each end of the set.
5% trimmed mean = 30 206
62487 19
,.,.
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Median
The median of a data set is the value in the middle when the data items are arranged in ascending order.
If there is an odd number of items, the median is the value of the middle item.
If there is an even number of items, the median is the average of the values for the middle two items.
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Example: Apartment Rents Median
Median = 50th percentile
i = (p/100)n = (50/100)70 = 35.5 Averaging the 35th and 36th
data values:
Median = (475 + 475)/2 = 475
Mode
The mode of a data set is the value that occurs with greatest frequency.
Example: Apartment Rents
Mode
450 occurred most frequently (7 times)
Mode = 450425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Percentiles The p th percentile of a data set is a value such that at least p
percent of the items take on this value or less and at least (100-p) percent of the items take on this value or more.
– Arrange the data in ascending order.
– Compute index i, the position of the p th percentile.
i = (p/100)n
– If i is not an integer, round up. The p th percentile is the value in the i th position.
– If i is an integer, the p th percentile is the average of the values in positions i and i+1.
Example: Apartment Rents 90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Quartiles
Quartiles are specific percentiles. First Quartile = 25th Percentile Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Example: Apartment Rents
Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
Measures of Variability
Range Variance Standard Deviation Coefficient of Variation
Range
The range of a data set is the difference between the largest and smallest data values.
It is the simplest measure of dispersion. It is very sensitive to the smallest and largest data
values.
Example: Apartment Rents Range
Range = largest value - smallest value
Range = 615 - 425 = 190425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Variance
The variance is the average of the squared differences between each data value and the mean.
If the data set is a sample, the variance is denoted by s2.
If the data set is a population, the variance is denoted by 2.sxi x
n
2
1
( )i22
2
( )xN
2 )xi
Standard Deviation
The standard deviation of a data set is the positive square root of the variance.
It is measured in the same units as the data, making it more easily comparable to the mean.
If the data set is a sample, the standard deviation is denoted s.
If the data set is a population, the standard deviation is denoted (sigma).
s s 2
2
Coefficient of Variation
The coefficient of variation indicates how large the standard deviation is in relation to the mean.
If the data set is a sample, the coefficient of variation is computed as follows:
If the data set is a population, the coefficient of variation is computed as follows:
s
x( )100( )
( )100
Example: Apartment Rents
Variance
Standard Deviation
Coefficient of Variation
sxi x
n2
2
12 996 16
( ), .
, .
s s 2 2996 47 54 74. . . .
s
x 100
54 74
490 80100 11 15
.
.. ..
Chapter 3, Part 2 Descriptive Statistics II:
Numerical Methods
Measures of Relative Location and Locating Outliers– z -Scores– Chebyshev’s Theorem– The Empirical Rule– Detecting Outliers
z -Scores
The z -score is often called the standardized value. It denotes the number of standard deviations a data value xi is from the mean.
A data value less than the sample mean will have a z-score less than zero. A data value greater than the sample mean will have a z -score greater than zero. A data value equal to the sample mean will have a z -score of zero.z
x xsi
i
z -Score of Smallest Value (425)
Standardized Values for Apartment Rents
zx x
s 425 490 80
54 741 20
..
.i ..
.
-1.20-1.20 -1.11-1.11 -1.11-1.11 -1.02-1.02 -1.02-1.02 -1.02-1.02 -1.02-1.02 -1.02-1.02 -0.93-0.93 -0.93-0.93-0.93-0.93 -0.93-0.93 -0.93-0.93 -0.84-0.84 -0.84-0.84 -0.84-0.84 -0.84-0.84 -0.84-0.84 -0.75-0.75 -0.75-0.75-0.75-0.75 -0.75-0.75 -0.75-0.75 -0.75-0.75 -0.75-0.75 -0.56-0.56 -0.56-0.56 -0.56-0.56 -0.47-0.47 -0.47-0.47-0.47-0.47 -0.38-0.38 -0.38-0.38 -0.34-0.34 -0.29-0.29 -0.29-0.29 -0.29-0.29 -0.20-0.20 -0.20-0.20 -0.20-0.20-0.20-0.20 -0.11-0.11 -0.01-0.01 -0.01-0.01 -0.01-0.01 0.170.17 0.170.17 0.170.17 0.170.17 0.350.350.350.35 0.440.44 0.620.62 0.620.62 0.620.62 0.810.81 1.061.06 1.081.08 1.451.45 1.451.451.541.54 1.541.54 1.631.63 1.811.81 1.991.99 1.991.99 1.991.99 1.991.99 2.272.27 2.272.27
-1.20-1.20 -1.02-1.02 -1.02-1.02 -0.93-0.93 -0.93-0.93
Example: Apartment Rents
Chebyshev’S Theorem
At least (1 - 1/At least (1 - 1/k k 22) of the items in any data set will ) of the items in any data set will be within be within k k standard deviations of the mean, standard deviations of the mean, where where k k is any value greater than 1.is any value greater than 1.– At least 75% of the items must be within At least 75% of the items must be within k k = 2 = 2
standard deviations of the mean.standard deviations of the mean.– At least 89% of the items must be within At least 89% of the items must be within kk = 3 = 3
standard deviations of the mean.standard deviations of the mean.– At least 94% of the items must be within At least 94% of the items must be within kk = 4 = 4
standard deviations of the mean.standard deviations of the mean.
Example: Apartment Rents
Chebyshev’s Theorem
Let k = 1.5 with = 490.80 and s = 54.74
At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%
of the rent values must be between
- k(s) = 490.80 - 1.5(54.74) = 409
and
+ k(s) = 490.80 + 1.5(54.74) = 573
x
x
x
Chebyshev’s Theorem (continued)
Actually, 86% of the rent values
are between 409 and 573.
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Example: Apartment Rents
Empirical Rule
For data having a bell-shaped distribution:
Approximately 68% of the data values will be within one standard deviation of the mean.
Approximately 95% of the data values will be within two standard deviations of the mean.
Almost all of the items (99%) will be within three standard deviations of the mean.
Example: Apartment Rents Empirical Rule
Interval % in Interval
Within +/- 1s 436.06 to 545.54 48/70 = 69%
Within +/- 2s 381.32 to 600.28 68/70 = 97%
Within +/- 3s 326.58 to 655.02 70/70 = 100%
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Detecting Outliers
An outlier is an unusually small or unusually large value in a data set.
A data value with a z -score less than -3 or greater than +3 might be considered an outlier.
It might be an incorrectly recorded data value. It might be a data value that was incorrectly
included in the data set. It might be a correctly recorded data value that
belongs in the data set!
Example: Apartment Rents Detecting Outliers
The most extreme z -scores are -1.20 and 2.27.
Using |z | > 3 as the criterion for an outlier, there are no outliers in this data set.
Chapter 3, Part 3 Descriptive Statistics II:
Numerical Methods Measures of Association Between Two Variables Working with Grouped Data
Measures of Association Between Two Variables
Covariance Correlation Coefficient
Covariance Positive values indicate a positive relationship. Negative values indicate a negative relationship. If the data sets are samples, the covariance is
denoted by sxy.
If the data sets are populations, the covariance is denoted by .
sx x y y
nxyi i
( )( )
1
xyi x i yx y
N
( )( )
xy
Correlation Coefficient The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship. If the data sets are samples, the coefficient is denoted by rxy.
If the data sets are populations, the coefficient is denoted by .
rs
s sxyxy
x y
xyxy
x yxy
Where Sx and Sy are the standarddeviations for each variable!
Mean for Grouped Data Sample Data
Population Data
where fi = frequency of class i
Mi = midpoint of class i
xf Mni i
f M
Ni i
Example: Apartment Rents Given below is the previous sample of monthly
rents for one-bedroom apartments presented as grouped data in the form of a frequency distribution.
Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6
Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6
Example: Apartment Rent Mean for Grouped Data
This approximation differs by $2.41
from
the actual sample mean of $490.80.
Rent ($) fi Mi fiMi420-439 8 429.5 3436.0440-459 17 449.5 7641.5460-479 12 469.5 5634.0480-499 8 489.5 3916.0500-519 7 509.5 3566.5520-539 4 529.5 2118.0540-559 2 549.5 1099.0560-579 4 569.5 2278.0580-599 2 589.5 1179.0600-619 6 609.5 3657.0
Total 34525.0
Rent ($) fi Mi fiMi420-439 8 429.5 3436.0440-459 17 449.5 7641.5460-479 12 469.5 5634.0480-499 8 489.5 3916.0500-519 7 509.5 3566.5520-539 4 529.5 2118.0540-559 2 549.5 1099.0560-579 4 569.5 2278.0580-599 2 589.5 1179.0600-619 6 609.5 3657.0
Total 34525.0
x 34 525
70493 21
,.
Variance for Grouped Data
Sample Data
Population Data
sf M x
ni i2
2
1
( )
22
f MN
i i( )
Example: Apartment Rents
Sample Variance for Grouped Data
Sample Standard Deviation for Grouped Data
This approximation differs by only $.20 from the
actual standard deviation of $54.74.
s2 3 017 89 , .
s 3 017 89 54 94, . .