Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | virginia-leonard |
View: | 226 times |
Download: | 2 times |
Where are we?
Measure of central tendency
FETP India
Competency to be gained from this lecture
Calculate a measure of central tendency that is adapted to the sample studied
Key issues
• Measures of central tendency Mode Median Mean Geometric mean
• Appropriate applications
Summary statistics
• A single value that summarizes the observed value of a variable Part of the data reduction process
• Two types: Measures of location/central tendency/average Measures of dispersion/variability/spread
• Describe the shape of the distribution of a set of observations
• Necessary for precise and efficient comparisons of different sets of data The location (average) and shape (variability) of
different distributions may be different
No. ofPeople
Factor X
Population A
Population B
Different VariabilitySame Location
Different variability, same location
Different location, same variability
Quick definitions of measures of central tendency
• Mode The most frequently occuring observation
• Median The mid-point of a set of ordered
observations
• Arithmetic mean The product of the division of the arithmetic
sum of observations by the number of observations
The mode
• Definition The mode of a distribution is the value that is
observed most frequently in a given set of data
• How to obtain it? Arrange the data in sequence from low to high Count the number of times each value occurs The most frequently occurring value is the
mode
Mode
The mode
0
2
4
6
8
10
12
14
16
18
20
N
Mode
Mode
Examples of mode (1/2): Annual salary (in 100,000 rupees)
• 4, 3, 3, 2, 3, 8, 4, 3, 7, 2• Arranging the values in order:
2, 2, 3, 3, 3, 3, 4, 4, 7, 8 7, 8 The mode is three times “3”
Mode
Examples of mode (2/2): Incubation period for hepatitis
affected persons (in days) • 29, 31, 24, 29, 30, 25• Arranging the values in order:
24, 25, 29, 29, 30, 31 Mode is 29
Mode
The mode is the only location statistics to be used when some characteristic itself cannot be
measuredColour preference of people for their cars
Colour preference Number of people
Green 354
Blue 852
Gray 310
Red 474
Mode
Mode
Specific features of the mode
• There may be no mode When each value is unique
• There may be more than one mode When more than 1 peak occurs Bimodal distribution
• The mode can be misinterpreted Is a distribution skewed or bimodal ?
• The mode is not amenable to statistical tests
• The mode is not based upon all observationsMode
The median
• The median describes literally the middle value of the data
• It is defined as the value above or below which half (50%) the observations fall
Median
Computing the median
• Arrange the observations in order from smallest to largest (ascending order) or vice-versa
• Count the number of observations “n” If “n” is an odd number
• Median = value of the (n+1) / 2th observation
If “n” is an even number• Median = the average of the n / 2th and (n
/2)+1th observations
Median
Computing the Median, ExampleExample of median calculation
• What is the median of the following values: 10, 20, 12, 3, 18, 16, 14, 25, 2 Arrange the numbers in increasing order
• 2 , 3, 10, 12, 14, 16, 18, 20, 25• Median = 14
• Suppose there is one more observation (8) 2 , 3, 8, 10, 12, 14, 16, 18, 20, 25 Median = Mean of 12 & 14 = 13
Median
Advantages and disadvantages of the median
• Advantages The median is unaffected by extreme values
• Disadvantages The median does not contain information on
the other values of the distribution • Only selected by its rank• You can change 50% of the values without
affecting the median
The median is less amenable to statistical tests
Median
Median
0
2
4
6
8
10
12
14
Class of the variable
0
2
4
6
8
10
12
14
Class of the variable
The median is not sensitive to extreme values
Median
Same median
Mean (Arithmetic mean / Average)
• Most commonly used measure of location• Definition
Calculated by adding all observed values and dividing by the total number of observations
• Notations Each observation is denoted as x1, x2, … xn The total number of observations: n Summation process = Sigma : The mean: X
X = xi /n
Mean
Computation of the mean
• Duration of stay in days in a hospital 8,25,7,5,8,3,10,12,9
• 9 observations (n=9)• Sum of all observations = 87• Mean duration of stay = 87 / 9 = 9.67
• Incubation period in days of a disease 8,45,7,5,8,3,10,12,9
• 9 observations (n=9)• Sum of all observations =107 • Mean incubation period = 107 / 9 = 11.89
Mean
Advantages and disadvantages of the mean
• Advantages Has a lot of good theoretical properties Used as the basis of many statistical tests Good summary statistic for a symmetrical
distribution
• Disadvantages Less useful for an asymmetric distribution
• Can be distorted by outliers, therefore giving a less “typical” value
Mean
Mean of several groups combined
Group ( i )
Size ( n i)
Mean ( x i)
Sum (n i xi )
1 2 3
10
15
25
41
36
42
410
540
1050
Total 50 -- 2000
Mean of all groups = 2000 / 50 = 40 Crude average = 39.7
The geometric mean
• Background Some distribution appear symmetric after
log transformation (e.g., Neutrophil counts)
A log transformation may help describing the central tendency
• Definition The geometric mean is the antilog of the
mean of the log values
Geometric mean
Calculating a geometric mean
• Observe the set of observations 5,10,20,25,40
• Take the logarithm of these values 0.70, 1.00, 1.30, 1.40 & 1.60.
• Calculate the mean of the log values 0.70 + 1.00 + 1.30 + 1.40 + 1.60 = 6.00 6.00/ 5 = 1.20
• Take the antilog of the mean of the log values Antilog (1.20) = 15.85
Geometric mean
Geometric mean of several groups combined
Overall GM = antilog of ( 48.42 / 50) = antilog ( 0.9684 ) = 9.3
Group (i)
Number of patients
(ni) Geometric mean (GM) log GM
ni * log GM
A
B
C
20
18
12
8.5
10.2
9.4
0.93
1.01
0.97
18.60
18.18
11.64
Total
50 -- -- 48.42
Geometric mean
0
2
4
6
8
10
12
14
N
Mean = 10.8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Median = 10 Mode = 13.5
Choosing
What measure of location to use?
• Consider the duration (days) of absence from work of 21 labourers owing to sickness 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7, 8, 9, 10,
10, 59, 80
• Mean = 11 days Not typical of the series as 19 of the 21
labourers were absent for less than 11 days Distorted by extreme values
• Median = 5 days Better measure
Choosing
Choice of measure of central tendency for symmetric distributions
• Any one of the central/location measures can be used
• The mean has definite advantages if subsequent computations are needed
Choosing
Choice of measure of central tendency for asymmetric
distributions• For skewed distributions, the mean is not
suitable Positive skewed: Mean gives a higher value Negatively skewed: Mean gives a lower value
• If some observations deviate much more than others in the series, then median is the appropriate measure
• If the log-transformed distribution is symmetric, the geometric mean may be used
Choosing
Key messages
• The mode is the most common value• The median is adapted when there are
extreme values• The mean is adapted for symmetric
distribution• The geometric mean may be useful
when log transformed data are symmetric
• The type of the distribution determines the measure of central tendency to use