14-04-2012
1
Research Methodology Dr. Nimit Chowdhary, Professor
Saturday, April 14, 2012 1© Dr. Nimit Chowdhary
To be able to compute four common measures of variability Range Inter-quartile range Standard deviation Variance
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 2
14-04-2012
2
The range is the difference between the largest and the smallest values in a set of values.
Example2 4 9 5 7 3
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 3
smallest
largest
Range = Largest – Smallest = 9 – 2 = 7
(+)Easy to calculate (-) Relies only on two
values. (-) Ignores variability of
all middle values
Data set A:
1 2 3 4 5 6 7 8 9
Range= 9 – 1 = 8
Data set B:
1 1 1 1 1 1 1 1 9
Range= 9 – 1 = 8
14-04-2012
3
The interquartile range is a measure of variability, based on dividing the dataset into quartiles.
Quartiles divide an ordered data set into four equal parts.
The values that divide each part are called the first, second and third quartiles.
First, second and third quartiles are denoted by Q1, Q2 and Q3 respectively.
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 5
Arrange data set in numerical order Define the quartiles- the second quartile Q2 is
the median of the entire data set Q1 is the median of the data below Q2 Q3 is the median of the data above Q2 The interquartile range is
IQR = Q3 –Q1
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 6
14-04-2012
4
Ordered data set0 1 2 3 4 5 6 7 8 9
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 8
Median is Q2Q2 = (4 + 5)/2
Q2 = 4.5Q1 = 2 Q3 = 2
Interquartile range = Q3 – Q1Interquartile range = 7 – 2 = 5
14-04-2012
5
IQR ignores outliers!0 1 2 3 4 5 6 7 8 999
Median is Q2Q2 = (4 + 5)/2
Q2 = 4.5Q1 = 2 Q3 = 2
Interquartile range = Q3 – Q1Interquartile range = 7 – 2 = 5
While range is strongly influenced by outliers, IQR is not
Variance is the average squared deviation from the mean
2 = (Xi- )2 / N 2 = variance = summation symbol Xi= element i from the data set =mean of the data set N = number of elements in the data set
14-04-2012
6
Find the variance of the following0, 1, 5, 6
Number of entries = N= 4
Mean == X/ N Deviation sum of
squares= SS = (x- )2
NX
Variance
22 )(
14-04-2012
7
Find the variance of the following0, 1, 5, 6
Mean: = X/ N= (0+1+5+6)/4= 12/4= 3
Dev sum of squares= SS= (x- )2
= (0-3)2 + (1-3)2 + (5-3)2 + (6-3)2
= 9+4+4+9 = 26 Variance= (Xi- )2 / N
= 26/4 = 6.5
NX
Variance
22 )(
The standard deviation is the square root of the variance
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 14
NXdeviationdardS i /)( tan 2
14-04-2012
8
What happens to variability when you add a constant to each value in the data set?
All measures of variability- range, interquartile range, variance, and standard deviation- stay the same
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 15
The variance and standard deviation are the most common and useful measures of variability.
These two measures provide information about how the data vary about the mean.
14-04-2012
9
When the data are clustered about the mean, the variance and standard deviation will be somewhat small.
When the data are widely scattered about the mean, the variance and standard deviation will be somewhat large.
14-04-2012
10
The sample variance is an approximate average of the squared deviations of the data values from the sample mean.
The sample variance is computed from the following formula and is denoted by s2:
3-20
What is the variance for the following sample values?
3 8 6 14 0 11
NOTE: Do not let the formula intimidate you. We will build a table to help with the computations.
14-04-2012
11
We will build a table to help in the computations. NOTE: The mean = 7.
S2 = 132/(6 – 1)= 132/5= 26.4
In the previous example, observe that the variance is large relative to the size of the data values.
This can be observed from the plot which shows that the data values are very much spread out about the mean value of 7.
14-04-2012
12
The sample standard deviation is the positive square root of the variance.
NOTE: the standard deviation has the same unit as the variable.
Example: The sample standard deviation for the previous example is
If all of the observations have the same value, the sample variance (standard deviation) will be zero. That is, there is no variability in the data set.
The variance (standard deviation) is influenced by outliers in the data set.
The unit for the standard deviation is the same as that for the raw data.
Thus it is preferred to use the standard deviation rather than the variance as the measure of variability.
14-04-2012
13
The population variance is the average of the squared deviations of the data values from the population mean.
The population variance is computed from the following formula and is denoted by ss2 2 :
The population standard deviation is the positive square root of the population variance.
The population standard deviation is computed from the following formula and is denoted by s s :
14-04-2012
14
The coefficient of variation (CV) allows us to compare the variation of two (or more) different variables.
Explanation of the term – sample coefficient of variation: the sample coefficient of variation is defined as the sample standard deviation divided by the sample mean of the data set.
Usually, the result is expressed as a percentage.
NOTE: The sample coefficient of variation standardizes the variation by dividing it by the sample mean.
14-04-2012
15
The coefficient of variation has no units since the standard deviation and the mean have the same units, and thus cancel out each other.
Because of this property, we can use this measure to compare the variations for different variables with different units.
3-30
The mean number of tourists arriving at a monument over a four-month period was 90, and the standard deviation was 5. The average expenditure made at the site was Rs.5,400, and the standard deviation was Rs. 775. Compare the variations of the two variables.
14-04-2012
16
Since the CV is larger for the revenues, there is more variability in the recorded revenues than in the number of tickets issued.
Explanation of the term – population coefficient of variation: the population coefficient of variation is defined as the population standard deviation divided by the population mean of the data set.
NOTE: The population CV has the same properties as the sample CV.
14-04-2012
17
Different measures of dispersion Range Interquartile range Variance Standard deviation
Concept of Coefficient of Variance
Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 33