+ All Categories
Home > Documents > Topic 5 Measures of Dispersion

Topic 5 Measures of Dispersion

Date post: 07-Apr-2018
Category:
Upload: isye11
View: 223 times
Download: 0 times
Share this document with a friend

of 17

Transcript
  • 8/3/2019 Topic 5 Measures of Dispersion

    1/17

    X INTRODUCTION

    We have discussed earlier the position quantities such as mean and quartiles,

    which can be used to summarise the distributions. However these quantities are

    ordered numbers located at the horizontal axis of the distribution graph. As

    numbers along the line, they are not able to explain in quantitative measure for

    example about the shape of the distribution.

    In this topic we will learn about quantity measures regarding the shape of a

    distribution. For example, the quantity namely variance is usually used to measure

    the dispersion of observations around their mean location. The range is used to

    describe the coverage of a given data set. Coefficient of skewedness will be usedto measure the assymetricl distribution of a curve. The coefficient of curtosis is

    used to measure peakedness of a distribution curve.

    TTooppiicc

    55X

    Measures ofDispersion

    LEARNING OUTCOMES

    By the end of this topic, you should be able to:

    1. Describe the concept of dispersion measures;

    2. Explain the concept of range as a dispersion measure;

    3. Categorise the distribution curve by its symmetry and non-symmetry; and

    4. Analyse variance and standard deviation.

  • 8/3/2019 Topic 5 Measures of Dispersion

    2/17

    TOPIC 5 MEASURES OF DISPERSION W 63

    MEASURE OF DISPERSION

    The mean of a distribution has been termed as location parameter. Locations of

    any two different distributions can be observed by looking at their respective

    means. The range will tell us about the coverage of a distribution, whilst

    variance will measure the distribution of observations around their mean andhence the shape of a distribution curve. Small value of variance means the

    distribution curve is more pointed and the larger value of the variance indicate the

    distribution curve is more flat. Thus, variance is sometime called shape

    parameter.

    Figure 5.1(a) shows two distribution curves with different location centres but

    possibly of same dispersion measure (they may have the same range of coverage,

    but of different variances). Curve 1 could represent a distribution of mathematics

    marks of male students from School A and Curve 2 represents distribution of

    mathematics marks of female students in the same examination from the sameschool.

    Figure 5.1 (a): Two distribution curves with different location centres but possibly of

    same dispersion measure

    Figure 5.1 (b) shows two distribution curves with same location centre but

    possibly of different dispersion measures (they may have different range of

    5.1

    Is it important to comprehend quantities like mean and quartiles to

    prepare you to study this topic? Give your reasons.

    ACTIVITY 5.1

  • 8/3/2019 Topic 5 Measures of Dispersion

    3/17

    X TOPIC 5 MEASURES OF DISPERSION64

    coverages, as well as variances). Curve 3 could represent a distribution of physics

    marks of male students from School A and Curve 4 represents distribution of

    physics marks of male students in the same examination but from School B.

    Figure 5.1 (b): Two distribution curves with same location centre but possibly of

    different dispersion measures

    Figure 5.1 (c) shows two distribution curves with different location centres but

    possibly of same dispersion measure (they may have the same range of coverage,

    but possibly of the same variance). However, Curve 5 is slightly skewed to the

    right and Curve 6 is slightly skewed to the left. Curve 5 could represent a

    distribution of mathematics marks of students from School A and Curve 6 may

    represent distribution of mathematics marks of students in the same trial

    examination but from different school.

    Figure 5.1 (c): Two distribution curves with different location centres but possibly of

    same dispersion measure

    By looking at the Figure 5.1(a), (b) and (c), beside the mean, we need to know

    other quantities such as variance, range and coefficient of skewedness in order to

    describe or summarise completely a given distribution.

  • 8/3/2019 Topic 5 Measures of Dispersion

    4/17

    TOPIC 5 MEASURES OF DISPERSION W 65

    The following are examples of dispersion measures:

    (a) Dispersion Measure Around Mean of Distribution

    It measures the deviation of observations from their mean. There are two

    types that can be considered:

    (i) Mean Deviation; and

    (ii) Standard Deviation.

    However in this module, we will consider only Standard Deviation. You canrefer to any statistics book for Mean Deviation.

    (b) Central Percentage Dispersion Measure

    This measure has some relationship with median. There are two types that

    can be considered:

    (i) Central Percentage Range 10 90;

    (ii) Semi Inter Quartile Range.

    (c) Distribution Coverage

    This quantity measures the range of the whole distribution which shows theoverall coverage of observations in data set.

    THE RANGE

    Thus,

    value

    imumminthe

    value

    imummaxtheRange

    5.2

    Dispersion Measure involves measuring the degree of scatteredness

    observations surrounding their mean centre.

    The range is defined as the difference between the maximum value and the

    minimum value of observations.

    (5.1)

  • 8/3/2019 Topic 5 Measures of Dispersion

    5/17

    X TOPIC 5 MEASURES OF DISPERSION66

    As can be seen from the formula (1.1), range can be easily calculated. However, it

    is depending on the two extreme values to measure the overall data coverage. It

    does not explain anything about the variation of observations between the two

    extreme values.

    Example 5.1

    Give comment on the scatteredness of observations in each of data sets:

    Set 1 12 6 7 3 15 5 10 18 5

    Set 2 9 3 8 8 9 7 8 9 18

    Solution

    Figure 5.2

    (a) Arrange observations in ascending order of values, and draw scatter points

    plot for each set of data.

    Set 1 3 5 5 6 7 10 12 15 18

    Set 2 3 7 8 8 8 9 9 9 18

    (b) Both data sets having same range which is 18 3 = 15.

  • 8/3/2019 Topic 5 Measures of Dispersion

    6/17

    TOPIC 5 MEASURES OF DISPERSION W 67

    (c) Observations in Set 1 are scattered almost evenly through out the range.

    However, for Set 2, most of the observations are concentrated around

    numbers 8 and 9.

    (d) We can consider numbers 3 and 18 as outliers to the main body of the data

    Set 2.

    (e) From this exercise we learn that it is not good enough to compare only

    overall data coverage using range. Some other dispersion measures have to

    be considered too.

    To conclude, Figures 5.2 show that two distributions can have the same range but

    they could be of different shapes which cannot be explained by range.

    Range does not explain the density of a data set. What do you

    understand about this statement? Discuss it with your coursemates.

    EXERCISE 5.1

    1. The following are two sets of mathematics marks from anexamination:

    Set A: 45, 48, 52, 54, 55, 55, 57, 59, 60, 65

    Set B: 25, 32, 40, 45, 53, 60, 61, 71, 78, 85

    (i) Calculate the mean, and the range of both data sets.

    (ii) Give comment on the scattered ness of observations in both

    sets.

    2. Below are two sets of physics marks in an examination.

    Set C 35 62 42 75 26 50 57 8 88 80 18 83

    Set D 50 42 60 62 57 43 46 56 53 88 8 59

    (i) Calculate the mean, and the range of both data sets.

    (ii) Give comment on the scattered ness of observations in both sets.

    ACTIVITY 5.2

  • 8/3/2019 Topic 5 Measures of Dispersion

    7/17

    X TOPIC 5 MEASURES OF DISPERSION68

    INTER QUARTILE RANGE

    The longer range indicates that the observations in the central main body are more

    scattered. This quantity measure can be used to complement to the overall range

    of data as the latter has failed to explain the variations of observations between

    two extreme values.

    Besides, the former does not depend on the two extreme values. Thus interquartile range can be used to measure the dispersions of main body data. It is also

    recommended to complement the overall data range when we make comparison of

    two sets of data.

    For example let us consider question 2 in Exercise 5.1 where Set C is compared

    with Set D. Although they have the same overall data range (88 8 = 80), they

    have different distribution. The inter quartile range for Set C is larger than for Set

    D. This indicates that the main body data of Set D is less scattered than the main

    body data of Set C.

    Inter quartile range is given by:

    13 QQIQR

    Where double bars | | means absolute value. Some reference books prefer to use

    Semi Inter Quartile Range which is given by:

    Q =

    22

    13 QQIQR

    5.3

    Inter quartile range is the difference between Q3 and Q1. It is used to measure

    the range of 50% central main body of data distribution.

    (5.2)

    (5.3)

  • 8/3/2019 Topic 5 Measures of Dispersion

    8/17

    TOPIC 5 MEASURES OF DISPERSION W 69

    Example 5.2

    By using the inter quartile range, compare the spread of data between Set C andSet D in question 2 in Exercise 5.1.

    Solution

    (a) Number of observations, n = 12 for both data sets.

    (b) Q1is at the position 1

    4

    1n = 3.25.

    Set C: 8, 18, 26, 35, 42, 50, 57, 62, 75, 80, 83, 88

    Set D: 8, 42, 43, 46, 50, 53, 56, 57, 59, 60, 62 ,88Set C: Q

    1= 26 + 0.25 (35 26) = 28.25;

    Set D: Q1= 43 + 0.25 (46 43) = 43.75.

    (c) Q3is at the position 1

    4

    3n = 9.75.

    Set C, Q3= 75 + 0.75 (80 75) = 78.75;

    Set D, Q3= 59 + 0.75 (60 59) = 59.75

    (d) Then the inter quartile range for each data set is given by

    IQR(C) = 78.75 28.25 = 50.5; and

    IQR(D) = 59.75 43.75 = 16.0.

    (e) Since IQR (D) < IQR(C) therefore data Set D is considered less spread than

    Set C.

    Coefficient of Variation, VQ

    Inter quartile range (IQR) and Semi inter quartile range (Q) are two quantitieswhich have dimensions. Therefore they become meaningless when being used in

    comparing two data sets of different units. For instance, comparison of data on

    age (years) and weights (Kg). To avoid this problem, we can use the coefficient of

    quartiles variation, which has no dimension and is given by:

  • 8/3/2019 Topic 5 Measures of Dispersion

    9/17

    X TOPIC 5 MEASURES OF DISPERSION70

    3 1

    3 1

    3 13 1

    2

    2

    Q

    Q Q

    Q QQ

    V TTQ Q QQ Q

    In the above formula (5.4), TTQ is the mid point between Q1, and Q3; and the two

    bars | | means absolute value.

    EXERCISE 5.2

    Given the following three sets of data:

    Set E:

    Age(Yrs) 5-14 15-24 25-34 35-44 45-54 55-64 65-74 f Number of

    Residents35 90 120 98 130 52 25 550

    Set F:

    Value ofProducts (RM)

    x 100

    10-14.99

    15-19.99

    20-24.99

    25-29.99

    30-34.99

    35-39.99

    40-44.99 f

    Number of

    Products2 6 15 22 35 15 5 100

    Set G:

    Extra

    Charges(RM)1-1.02

    1.03-1.05

    1.06-1.08

    1.09-1.11

    1.12-1.14

    1.15-1.17

    1.18-1.20 f

    Number ofShops

    3 15 28 30 25 14 5 120

    1. Calculate Q1, Q2, Q3 for each data set

    2. Obtain the inter quartile range (IQR) for each data set

    3. Then make com arison of the s read of the above data sets.

    (5.4)

  • 8/3/2019 Topic 5 Measures of Dispersion

    10/17

    TOPIC 5 MEASURES OF DISPERSION W 71

    VARIANCE AND STANDARD DEVIATION

    If we have two distributions, the one with larger variance is more spreading and

    hence its frequency curve is more flat. Variance of population uses symbol 2V .

    Variance always has positive sign. Standard deviation is obtained by taking square

    root of the variance. In this module, we will consider the given data as a population.

    5.4.1 Standard Deviation and the Variance ofUngrouped Data

    Suppose we have n numbers x1,x

    2, ,x

    n, with their mean (given or calculated) as

    P . Then the standard deviation is given by:

    n

    xn

    i

    i

    1

    2P

    V

    In words it means the square root of the average of squared distance of each score

    (or observation) from the mean. It has positive sign. The population variance ( 2V )

    is the square of the standard deviation.

    Table 5.1: Steps of Obtaining Population Standard Deviation

    Steps Symbols Used

    (a) Calculate the population mean P

    (b) Obtain the deviation of each score from mean Pix , I = 1,2,,n

    (c) Obtain the square of deviation in step (b) 2Pix , I = 1,2,,n

    (d) Obtain the average of the squared deviations

    n

    xn

    i

    i

    1

    2P

    (e) Obtain the square root of the average in step (d)

    n

    xni

    i

    i

    1

    2P

    V

    5.4

    Variance is defined as the average of squared distance of each score (orobservation) from the mean. It is used to measure the spreading of data.

    (5.5)

  • 8/3/2019 Topic 5 Measures of Dispersion

    11/17

    X TOPIC 5 MEASURES OF DISPERSION72

    Example 5.3

    Obtain the standard deviation of the data set 20, 30, 40, 50, 60.

    Solution

    Variable (x) Mean Deviation

    ( P )

    Squared Mean Deviation

    ( P )2

    20 -20 400

    30 -10 100

    40 0 050 10 100

    60 20 400

    Sum = 200 Sum = 1000

    5/200P =40 Mean squared = 1000/5 = 200

    Using formula (5.5), the standard deviation of the population,

    200 14.14V .

    5.4.2 Alternative Formulas to Enhance Hand

    Calculations

    (a) To avoid of subtracting each score from P , the equivalence formula (5.6)

    can be used to calculate the standard deviation.

    EXERCISE 5.3

    1. Obtain the standard deviation of data Sets 1 & 2 in Example 5.1.

    2. Obtain the standard deviation of data Sets A, B, C and D in

    Exercise 5.1.

  • 8/3/2019 Topic 5 Measures of Dispersion

    12/17

    TOPIC 5 MEASURES OF DISPERSION W 73

    22

    n

    x

    n

    x iiV

    Example 5.4

    Obtain the standard deviation of data Set 2 in Example 5.1

    Solution

    From formula (5.5), the standard deviation is

    2817 79

    3.719 9

    V

    (You can compare this with the answer obtained in Exercise 5.3.)

    (b) Some time the population mean P is not needed and we are only required to

    find standard deviation. The formula (5.7) below does not involve P , which

    can be used instead. In this formula, A is assumed mean which is an

    arbitrary number. You can select such A either from the given numbers in

    the set or any convenience number as you like.

    22

    n

    Ax

    n

    Ax iiV

    (5.6)

    (5.7)

  • 8/3/2019 Topic 5 Measures of Dispersion

    13/17

    X TOPIC 5 MEASURES OF DISPERSION74

    Example 5.5

    Obtain the standard deviation of data Set 1 in Example 5.1.

    Solution

    Let us select number 10 in the data set as assumed mean A, then

    By using formula (5.5), the standard deviation is

    2217 9

    4.807 4.819 9

    V

    |

    For comparison, suppose we choose an arbitrary number A = 5, the standard

    deviation is given by

    2352 36

    4.8079 9

    V

    We notice that the two values of assumed mean A give the same value of standard

    deviation.

    Standard Deviation and Variance of Grouped Data

    Standard deviation can be calculated through the following formula:

    22

    u

    u

    n

    xf

    n

    xf iiiiV

    where ix is the class mid-point of the ith class whose frequency isfi.

    (5.8)

  • 8/3/2019 Topic 5 Measures of Dispersion

    14/17

    TOPIC 5 MEASURES OF DISPERSION W 75

    Example 5.6

    Obtain the standard deviation of the books on weekly sales given in Table 2.6presented in Topic 2.

    Solution

    Actually, we need to include a new column for the productfux2 as follows:

    ClassClass

    Mid-point (x)

    Frequency

    (f)

    fux(fmultiplies x)

    fux2

    (fmultipliesx2 )

    34 - 43 38.5 2 77 2964.5

    44 - 53 48.5 5 242.5 11761.25

    54 - 63 58.5 12 702 41067

    64 - 73 68.5 18 1233 84460.5

    74 - 83 78.5 10 785 61622.5

    84 - 93 88.5 2 177 15664.5

    94 - 103 98.5 1 98.5 9702.25

    Sum 50 3315 227242.5

    The standard deviation is:

    222

    50

    3315

    50

    5.227242

    u

    u

    n

    xf

    n

    xf iiiiV = 12.21 12 books.

    The variance is 2V = 149.16 | 149 books.

    Do you think obtaining standard deviation for grouped data is easier than

    for ungrouped data? Justify your answer.

    ACTIVITY 5.3

  • 8/3/2019 Topic 5 Measures of Dispersion

    15/17

    X TOPIC 5 MEASURES OF DISPERSION76

    Coefficient of Variation

    When we want to compare the dispersion of two data sets with different units, as

    data for age and weight, variance is not appropriate to be used simply because thisquantity has a unit. However, the coefficient of variation, Vas given below which

    is dimensionless is more appropriate.

    Standard DeviationV

    Mean

    V

    P

    The comparison is more meaningful, because we compare standard deviation

    relative to their respective mean.

    SKEWNESS

    In a real situation we may have distribution which is symmetry such as in Figure

    5.1, case (a) or negatively skewed such as in Figure 5.1, case (b) or even

    positively skewed such as in Figure 5.1, Case (c). Sometimes we need to measure

    the degree of skewness. For that, we will use the coefficient of skewness given in

    the following section.

    Coefficient of Skewness

    Pearsons Coefficient of Skewness

    For a skewed distribution, the mean tends to lie on the same side of the mode as

    the longer tail [See Figure 5.1, case (b) & case (c)]. Thus, a measure of the

    asymmetry is supplied by the difference (Mean Mode). We have the following

    dimensionless coefficient of skewness:

    5.5

    EXERCISE 5.4

    Referring to data Sets E, F and G in Exercise 5.2:

    (a) Calculate the standard deviation and coefficient of variation; and

    (b) Compare their data spread.

    (5.9)

  • 8/3/2019 Topic 5 Measures of Dispersion

    16/17

    TOPIC 5 MEASURES OF DISPERSION W 77

    x Pearsons First Coefficient of skewness

    PCS (1) = ( )Mean Mode x xStandard Deviation s

    x Pearsons Second Coefficient of skewnessIf we do not have the value of Mode then by using formula (4.6) in Topic 4,

    we have the following second measure of skewness:

    PCS (2) =3( ) 3( )Mean Median x x

    Standard Deviation s

    EXERCISE 5.5

    Given the frequency table of two distributions as follows:

    Distribution A:

    Weight (Kg) 20-29 30-39 40-49 50-59 60-69 70-79 80-89

    No. of Students 4 10 20 30 25 10 1

    Distribution B:

    Marks 20-29 30-39 40-49 50-59 60-69 70-79 80-89

    No. of Students 10 20 30 20 15 4 1

    Make a comparison of the above distributions based on the following

    statistics:

    (a) Obtain: mean, mode, median, Q1, Q3, standard deviation and thecoefficient of variation.

    (b) Obtain the Pearsons coefficient of skewness and comment on the

    values obtained.

    (5.10)

    (5.11)

  • 8/3/2019 Topic 5 Measures of Dispersion

    17/17

    X TOPIC 5 MEASURES OF DISPERSION78

    x In this topic, you have studied various measures of dispersions which can beused to describe the shape of a frequency curve.

    x It has been mentioned earlier that overall range cannot explain the pattern ofobservations lying between the minimum and the maximum values.

    x Thus we introduce inter quartile range (IQR) to measure the dispersion of thedata in the middle 50% or the main body.

    x The variance which is being called Shape Parameter is also given to measurethe dispersion.

    x However, for comparison of two sets of data which have different units,coefficient of variation is used.

    x This coefficient is preferred because it is dimensionless.

    x Finally, the Pearsons coefficient of skewness is given to measure the degreeof skewness of non-symmetric distribution.


Recommended