SESSION 17 & 18
Last Update16th March 2011
Measures of DispersionMeasures of Variability
Lecturer: Florian BoehlandtUniversity: University of Stellenbosch Business SchoolDomain: http://www.hedge-fund-analysis.net/pages/ve
ga.php
Grouped Data – Investment BIntervals x f f(<) xf Actual
-25 to < -15 -20 2 2 -40-15 to < -5 -10 5 7 -50-5 to < 5 0 5 12 05 to < 15 10 4 16 40
15 to < 25 20 6 22 12025 to < 35 30 3 25 90
Total 25 160Total / 2 12.5Mean 6.400 7.072
Ome 5f(<) 12fme 4
Median 6.250 4.700Omo 15fm 6
fm-1 4fm+1 3
Mode 19.000 multimodal
Learning Objectives
1. Measures of relative standing: Median, Quartiles, Deciles and Percentiles
2. Measures of dispersion: Range3. Measures of variability: Variance and
Standard Deviation
Percentiles
The Pth percentile is the value for which P percent are less than that value and (100 – p)% are greater than that value.Some special percentiles commonly used include the median and the quartiles.Percentiles are measures of relative standing.
Terminology
50th Percentile 25th, 50th, 75th,100th Percentile
20th, 40th,…, 100th Percentile
10th, 20th,…, 100th Percentile
½ 1 Median ¼ 4 Quartiles 1/5 5 Quintiles
1/10 10 Deciles
Q2 Q1, Q2,Q3,Q4,
Lp
Location of a Percentile
The location L of a percentile is a function of the required percentile P and the sample size n:
Lp = (n + 1) * (P / 100)
As with the median, all observations must be placed in ascending or descending order first.
Calculation of Percentile
1. Place all observations in order2. Calculate the location of the percentile3. Since the location will often be a fraction
(e.g. n/2), the distance between the two observations in question must be multiplied with the fractional part of the location
4. The result of 3. is added to the preceding observation to yield the percentile
Percentile: An example
The following denotes the number of hours spent on the internet:0 0 5 7 8 9 12 14 22 23The values are already placed in order. The sample size is n = 10. We wish to determine L25, L50 and L75 (this is analogous to the quartiles Q1, Q2 and Q3)
Solution – Step 1
Use the formula to calculate the location for each percentile / quartile
Obs Data Quartile Lp1 0 25 2.75 =(10 + 1) * (25 / 100)2 0 50 5.50 =( + 1) * (50 / 100)3 5 75 8.25 =( + 1) * (75 / 100)4 7 n5 8 106 97 128 149 22
10 23
Solution – Step 2
Determine the fractional part of the location
Obs Data Quartile Lp Fraction1 0 25 2.75 0.75 =2.75 - 22 0 50 5.50 0.50 =5.5 - 53 5 75 8.25 0.25 =8.25 - 84 7 n5 8 106 97 128 149 22
10 23
Solution – Step 3Obs Data Quartile Lp Fraction Lower Upper
1 0 25 2.75 0.75 0 52 0 50 5.5 0.50 8 93 5 75 8.25 0.25 14 224 7 n5 8 106 97 128 149 22
10 23
Determine the next lower and next higher observation associated with the location. For 2.75, the two observations are 2 0 and 3 5.
Solution – Step 4
In order to determine the quartile associated with a given location, you need to calculate the following:
Solution = Lower + (Upper – Lower) * Fraction
Obs Data Quartile Lp Fraction Lower Upper Solution1 0 25 2.75 0.75 0 5 3.75 =0 + (5 - 0) * 0.752 0 50 5.5 0.50 8 9 8.50 =8 + (9 - 8) * 0.53 5 75 8.25 0.25 14 22 16.00 =14 + (22 - 14) * 0.254 7 n5 8 106 97 128 149 22
10 23
Exercises
You may use shortcuts if you want!1. Determine the first, second and third
quartiles:5 8 2 9 5 3 7 4 2 7 4 10 4 3 5
2. Determine the third and eighth deciles (30th and 80th percentile):10.5 14.7 15.3 17.7 15.9 12.2 10.0 14.1 13.9 18.5 13.9 15.1 15.7
Range
The range is the difference between the minimum and maximum observation. It is a measure of dispersion.The interquartile range is the difference between the third and the first quartile:
Interquartile Range = Q3 – Q1
Variance
The variance expresses the sum of the squared deviation of every single observation from the sample / population mean. All differences are squared so that positive and negative deviations from the mean are not cancelled out.The variance in a measure of variability.
Population and Sample Variance
We need to differentiate between population variance and sample variance. From the calculation of the mean, the sample variance has one less degrees of freedom (n-1) in calculating the variance. For the hypothetically infinite population of size N this is not the case.
Formulas
Sample Population
Sample size Total population size
Observation Observation
Sample Mean Population Mean
Sample Statistic Population Parameter
Calculation of Variance
1. Calculate the average:Sum of observations / number of observations
2. Subtract the average from every obervation3. Square the difference4. Sum the squared differences5. Divide the result from 4. by either N
(population) or n-1 (sample)
Variance: An example
The following denotes the number of hours spent on the internet for a sample of n = 10 adults:0 7 12 5 33 14 8 0 9 22Calculate the variance.
Solution – Step 1
Use the mean to calculate the differences between the mean and every observation
Obs Data Difference1 0 -8 =(0 - 8)2 7 -1 =(7 - 8)3 12 4 =(12 - 8)4 5 -3 =(5 - 8)5 3 -5 =(3 - 8)6 14 6 =(14 - 8)7 8 0 =(8 - 8)8 0 -8 =(0 - 8)9 9 1 =(9 - 8)
10 22 14 =(22 - 8)Total 80
n 10n-1
Average 8
Solution – Step 2
Square all differences. Next, Sum the differences and divide the sum by n – 1 (sample only)
Obs Data Difference Sqr Diff1 0 -8 64 =(-8)^22 7 -1 1 =(-1)^23 12 4 16 =(4)^24 5 -3 9 =(-3)^25 3 -5 25 =(-5)^26 14 6 36 =(6)^27 8 0 0 =(0)^28 0 -8 64 =(-8)^29 9 1 1 =(1)^2
10 22 14 196 =(14)^2Total 80 412
n 10n-1 9
Average 8 45.778In case of the sample, the sumsq is divided by n-1, in the case of the population it is divided by N
Interpretation Variance
The variance may be difficult to interpret. Remember that all differences are squared to avoid positive and negative differences from cancelling out. The statistic may be standardized by taking the square root of the variance. This statistic is called the standard deviation.However, the variances from two datasets may still be referred to when determining the more volatile dataset.
Example – Standard Deviation
The population standard deviation:
Similarly, the sample standard deviation:
Thus, for the internet usage example:
Solution – Step 3Obs Data Difference Sqr Diff
1 0 -8 642 7 -1 13 12 4 164 5 -3 95 3 -5 256 14 6 367 8 0 08 0 -8 649 9 1 1
10 22 14 196Total 80 412
n 10n-1 9
Average 8 45.778Sqrt 6.766
Interpretation:On average, observations of internet usage within the sample of ten people deviates by 6.766 h from the sample mean.
Exercises
1. Calculate the variance and standard deviation for the following data:2 8 9 4 1 7 5 4
2. Calculate the variance and standard deviation for the following data:7 -5 -3 8 4 -4 1 -5 9 3