+ All Categories
Home > Documents > 2.Central Tendency and Dispersion (1)

2.Central Tendency and Dispersion (1)

Date post: 27-Dec-2015
Category:
Upload: mohi-sharma
View: 12 times
Download: 1 times
Share this document with a friend
114
Frequency Distribution Frequency Distribution Convert raw data into a data array. Convert raw data into a data array. Construct: Construct: a frequency distribution. a frequency distribution. a relative frequency distribution. a relative frequency distribution. a cumulative relative frequency a cumulative relative frequency distribution. distribution. Construct different types of diagrams. Construct different types of diagrams. Visually represent data by using graphs Visually represent data by using graphs and charts. and charts.
Transcript
Page 1: 2.Central Tendency and Dispersion (1)

Frequency DistributionFrequency Distribution

• Convert raw data into a data array.Convert raw data into a data array.

• Construct:Construct:– a frequency distribution.a frequency distribution.– a relative frequency distribution.a relative frequency distribution.– a cumulative relative frequency distribution.a cumulative relative frequency distribution.

• Construct different types of diagrams.Construct different types of diagrams.

• Visually represent data by using graphs Visually represent data by using graphs and charts.and charts.

Page 2: 2.Central Tendency and Dispersion (1)

• Data arrayData array– An orderly presentation of data in either An orderly presentation of data in either

ascending or descending numerical ascending or descending numerical order.order.

• Frequency DistributionFrequency Distribution– A table that represents the data in A table that represents the data in

classes and that shows the number of classes and that shows the number of observations in each class.observations in each class.

Page 3: 2.Central Tendency and Dispersion (1)

• Frequency DistributionFrequency Distribution– ClassClass - The category - The category– FrequencyFrequency - Number in each class - Number in each class– Cum frequency: Number UP To the classCum frequency: Number UP To the class– Class limits Class limits - Boundaries for each class- Boundaries for each class– Class interval Class interval - Width of each class- Width of each class– Class mark Class mark - Midpoint of each class- Midpoint of each class

Page 4: 2.Central Tendency and Dispersion (1)

Two methods of groupingTwo methods of grouping

• Exclusive: example- 0-9.99, 10-Exclusive: example- 0-9.99, 10-19.99, 19.99,

20-29.99 …..20-29.99 …..

• Inclusive: example- 0-10, 10.01- 20Inclusive: example- 0-10, 10.01- 20

• 20.01-30 and so on…..20.01-30 and so on…..

• Both are correctBoth are correct

Page 5: 2.Central Tendency and Dispersion (1)

Sturges’ ruleSturges’ rule

• As such NO RIGID RULEAs such NO RIGID RULE• Classes normally are : 5 to 15 depending Classes normally are : 5 to 15 depending

on the dataon the data• How to set the approximate number of How to set the approximate number of

classes to classes to beginbegin constructing a frequency constructing a frequency distribution.distribution.

• K= 1+ 3.322 log NK= 1+ 3.322 log N• Class Interval= Range/ KClass Interval= Range/ K

where K = approximate number of classes to use andwhere K = approximate number of classes to use andNN = the number of observations in the data set . = the number of observations in the data set .

Page 6: 2.Central Tendency and Dispersion (1)

PrecautionsPrecautions

• No uneven classesNo uneven classes

• Avoid odd upper limitsAvoid odd upper limits

• Avoid odd intervalAvoid odd interval

• All values should have a unique classAll values should have a unique class

Page 7: 2.Central Tendency and Dispersion (1)

TestTest

• Which type of data is better?Which type of data is better?– Grouped or ungroupedGrouped or ungrouped– WHY ?WHY ?

Page 8: 2.Central Tendency and Dispersion (1)

How to Construct aHow to Construct aFrequency DistributionFrequency Distribution

. Number of classes. Number of classes Choose an approximate number of classes for Choose an approximate number of classes for

your data. Sturges’ rule can help.your data. Sturges’ rule can help.2. Estimate the class interval2. Estimate the class interval Divide the approximate number of classes Divide the approximate number of classes (from (from

Step 1) Step 1) into the range of your data to find the into the range of your data to find the approximate class interval, where the range is approximate class interval, where the range is defined as the largest data value minus the defined as the largest data value minus the smallest data value.smallest data value.

3. Determine the class interval3. Determine the class intervalRound the estimate Round the estimate (from Step 2) (from Step 2) to a convenient to a convenient value.value.

Page 9: 2.Central Tendency and Dispersion (1)

Lower Class LimitLower Class LimitDetermine the lower class limit for the first class Determine the lower class limit for the first class by selecting a convenient number that is smaller by selecting a convenient number that is smaller than the lowest data value.than the lowest data value.

5. Class Limits5. Class LimitsDetermine the other class limits by repeatedly Determine the other class limits by repeatedly adding the class width adding the class width (from Step 2) (from Step 2) to the prior to the prior class limit, starting with the lower class limit class limit, starting with the lower class limit (from Step 3)(from Step 3)..

6. Define the classes6. Define the classesUse the sequence of class limits to define the Use the sequence of class limits to define the classesclasses

Page 10: 2.Central Tendency and Dispersion (1)

Converting to a Converting to a Relative Relative Frequency DistributionFrequency Distribution

Retain the same classes defined in the Retain the same classes defined in the frequency distribution.frequency distribution.

2. Sum the total number of observations 2. Sum the total number of observations across all classes of the frequency across all classes of the frequency distribution.distribution.

3. Divide the frequency for each class by 3. Divide the frequency for each class by the total number of observations, the total number of observations, forming the percentage of data values in forming the percentage of data values in each classeach class

Page 11: 2.Central Tendency and Dispersion (1)

Example: Problem Example: Problem

• The average daily cost to community The average daily cost to community hospitals for patient stays during 1993 for hospitals for patient stays during 1993 for each of the 50 U.S. states was given in the each of the 50 U.S. states was given in the next table.next table.– a) Arrange these into a data array.a) Arrange these into a data array.

– *) Approximately how many classes would be *) Approximately how many classes would be appropriate for these data? appropriate for these data?

– c & d) Construct a frequency distribution. State c & d) Construct a frequency distribution. State interval width and class mark.interval width and class mark.

– e) Construct a histogram, a relative frequency e) Construct a histogram, a relative frequency distribution, and a cumulative relative frequency distribution, and a cumulative relative frequency distribution.distribution.

Page 12: 2.Central Tendency and Dispersion (1)

DataDataAL $775AL $775 HI 823HI 823 MA 1,036MA 1,036 NM 1,046 SD NM 1,046 SD

506506AK 1,136AK 1,136 ID 659ID 659 MI 902MI 902 NY 784NY 784 TN 859TN 859AZ 1,091AZ 1,091 IL 917IL 917 MN 652MN 652 NC 763NC 763 TX TX

1,0101,010AR 678AR 678IN 898IN 898 MS 555MS 555 ND 507ND 507 UT 1,081UT 1,081CA 1,221CA 1,221 IA 612IA 612 MO 863MO 863 OH 940OH 940 VT 676VT 676CO 961CO 961 KS 666 KS 666 MT 482MT 482 OK 797OK 797 VA 830VA 830CT 1,058CT 1,058 KY 703KY 703 NE 626NE 626 OR 1,052 WA OR 1,052 WA

1,1431,143DE 1,024DE 1,024 LA 875LA 875 NV 900NV 900 PA 861 PA 861 WV 701WV 701FL 960FL 960 ME 738 NH 976ME 738 NH 976 RI 885 WI 744RI 885 WI 744GA 775GA 775 MD 889MD 889 NJ 829NJ 829 SC 838 WY SC 838 WY

537537

Page 13: 2.Central Tendency and Dispersion (1)

•Step 1. Number of classesStep 1. Number of classes– Sturges’ Rule: approximately 7 classes.Sturges’ Rule: approximately 7 classes.

The range is: $1,221 – $482 = $739The range is: $1,221 – $482 = $739

$739/7 = $106 and $739/8 = $92$739/7 = $106 and $739/8 = $92

• Steps 2 & 3. The Class Steps 2 & 3. The Class IntervalInterval – So, if we use 8 classes, we can make each So, if we use 8 classes, we can make each

class $100 wide.class $100 wide.

Page 14: 2.Central Tendency and Dispersion (1)

• Step 4. The Lower Class LimitStep 4. The Lower Class Limit– If we start at $450, we can cover the range in 8 If we start at $450, we can cover the range in 8

classes, each class $100 in width.classes, each class $100 in width.

The first class : $450 up to $550The first class : $450 up to $550

• Steps 5 & 6. Setting Class LimitsSteps 5 & 6. Setting Class Limits$450 up to $550$450 up to $550 $850 up to $950 $850 up to $950

$550 up to $650$550 up to $650 $950 up to $1,050 $950 up to $1,050

$650 up to $750 $1,050 up to $1,150$650 up to $750 $1,050 up to $1,150

$750 up to $850 $1,150 up to $1,250$750 up to $850 $1,150 up to $1,250

Page 15: 2.Central Tendency and Dispersion (1)

Average daily costAverage daily cost NumberNumberMarkMark

$450 – under $550$450 – under $550 44 $500$500 $550 – under $650 $550 – under $650 33 $600$600 $650 – under $750 $650 – under $750 99 $700$700 $750 – under $850 $750 – under $850 99 $800$800 $850 – under $950 $850 – under $950 11 11 $900$900 $950 – under $1,050 $950 – under $1,050 7 7 $1,000 $1,000 $1,050 – under $1,150 $1,050 – under $1,150 6 6 $1,100 $1,100 $1,150 – under $1,250 $1,150 – under $1,250 1 1 $1,200 $1,200 Interval width: $100Interval width: $100

Page 16: 2.Central Tendency and Dispersion (1)

Measures of Central Measures of Central Tendency and DispersionTendency and Dispersion

Page 17: 2.Central Tendency and Dispersion (1)

IntroductionIntroduction

Raw Data are the raw materials that will Raw Data are the raw materials that will have to be converted into finished products have to be converted into finished products (Information). From a voluminous database (Information). From a voluminous database containing raw data, it is impossible to see containing raw data, it is impossible to see any pattern unless they are converted into any pattern unless they are converted into information by data reduction. The reduction information by data reduction. The reduction can be achieved by summary measures, can be achieved by summary measures, which are concise and yet give a reasonably which are concise and yet give a reasonably accurate view of the original data. Here we accurate view of the original data. Here we cover the important summary measures of cover the important summary measures of central tendencycentral tendency and and dispersiondispersion (variation) (variation)

Page 18: 2.Central Tendency and Dispersion (1)

OutlineOutline1)1) What is Central Tendency? What is Central Tendency?

2)2) Measures of Central TendencyMeasures of Central Tendency

3)3) Measures of DispersionMeasures of Dispersion

Page 19: 2.Central Tendency and Dispersion (1)

1) What is Central 1) What is Central Tendency?Tendency?

Whenever you measure things of the same Whenever you measure things of the same kind, a fairly large number of such kind, a fairly large number of such measurements will tend to cluster around measurements will tend to cluster around the middle value. The question that arises is the middle value. The question that arises is " is it possible to define one typical " is it possible to define one typical representative average in such a manner representative average in such a manner that the remaining items in the data set will that the remaining items in the data set will cluster around this value?" will have a cluster around this value?" will have a tendency to be closer to this value? Such a tendency to be closer to this value? Such a value is called a measure of "Central value is called a measure of "Central Tendency". The other terms that are used Tendency". The other terms that are used synonymously are "Measures of Location", or synonymously are "Measures of Location", or "Statistical Averages". "Statistical Averages".

Page 20: 2.Central Tendency and Dispersion (1)

2) Measures of Central 2) Measures of Central TendencyTendency

Quantitative Specialists, Statisticians, and Quantitative Specialists, Statisticians, and Information Analysts rely heavily on summary Information Analysts rely heavily on summary measures when a large mass of data will have to measures when a large mass of data will have to be analyzed to help decision-makers. As a be analyzed to help decision-makers. As a manager, You need these summary measures of manager, You need these summary measures of central tendency to draw meaningful conclusions in central tendency to draw meaningful conclusions in your functional area of operation. The most widely your functional area of operation. The most widely used measures of central tendency are used measures of central tendency are Arithmetic Arithmetic MeanMean, , MedianMedian, and , and ModeMode..

Page 21: 2.Central Tendency and Dispersion (1)

Arithmetic MeanArithmetic MeanArithmetic Mean (called mean) is the most common Arithmetic Mean (called mean) is the most common measure of central tendency used by all managers in measure of central tendency used by all managers in their sphere of activities. It is defined as the sum of their sphere of activities. It is defined as the sum of all observations in a data set divided by the total all observations in a data set divided by the total number of observations. For example, consider a number of observations. For example, consider a data set containing the following observations:data set containing the following observations:

  4, 3, 6, 5, 3, 3. The arithmetic mean = 4, 3, 6, 5, 3, 3. The arithmetic mean = (4+3+6+5+3+3)/6 =4. In symbolic form mean is (4+3+6+5+3+3)/6 =4. In symbolic form mean is given bygiven by

= Arithmetic Mean= Arithmetic Mean

= Indicates sum all X values in the data set = Indicates sum all X values in the data set

= Total number of observations(Sample Size) = Total number of observations(Sample Size)

n

XX

X

X

n

Page 22: 2.Central Tendency and Dispersion (1)

Arithmetic Mean for Raw Arithmetic Mean for Raw Data ExampleData Example

The inner diameter of a particular grade of tire The inner diameter of a particular grade of tire based on 5 sample measurements are as follows: based on 5 sample measurements are as follows: (figures in millimeters)(figures in millimeters)  565, 570, 572, 568, 585565, 570, 572, 568, 585  Applying the formulaApplying the formula   

  We get mean = (565+570+572+568+585)/5 =572We get mean = (565+570+572+568+585)/5 =572

Caution: Arithmetic Mean is affected by extreme Caution: Arithmetic Mean is affected by extreme values or fluctuations in sampling. It is not the best values or fluctuations in sampling. It is not the best average to use when the data set contains extreme average to use when the data set contains extreme values (Very high or very low values).values (Very high or very low values).  

n

XX

Page 23: 2.Central Tendency and Dispersion (1)

MedianMedianMedian is the middle most observation when you arrange Median is the middle most observation when you arrange data in ascending or descending order of magnitude. That data in ascending or descending order of magnitude. That is, the data are ranked and the middle value is picked up. is, the data are ranked and the middle value is picked up. Median is such that 50% of the observations are above the Median is such that 50% of the observations are above the median and 50% of the observations are below the median and 50% of the observations are below the median. median.   Median is a very useful measure for ranked data in the Median is a very useful measure for ranked data in the context of consumer preferences and rating. It is not context of consumer preferences and rating. It is not affected by extreme values but affected by the number of affected by extreme values but affected by the number of observations.observations.

th value of ranked datath value of ranked data

n = Number of observations in the samplen = Number of observations in the sample

Note: If the sample size is an odd number then median is Note: If the sample size is an odd number then median is (n+1)/2 th value in the ranked data. If the sample size is (n+1)/2 th value in the ranked data. If the sample size is even, then median will be between two middle values. You even, then median will be between two middle values. You take the average of these two middle values.take the average of these two middle values.

2

1nMedian

Page 24: 2.Central Tendency and Dispersion (1)

Median for Raw Data Median for Raw Data Example -Odd Sample Example -Odd Sample Size Size

Marks obtained by 7 students in Computer Science Marks obtained by 7 students in Computer Science ExamExam

are given below: Compute the median.are given below: Compute the median.  4545 4040 6060 8080 9090 6565 5555  Arranging the data after ranking givesArranging the data after ranking gives  9090 8080 6565 6060 5555 4545 4040  Median = (n+1)/2 th value in this set = (7+1)/2 thMedian = (n+1)/2 th value in this set = (7+1)/2 thobservation= 4observation= 4th th observation=60observation=60Hence Median = 60 for this problem.Hence Median = 60 for this problem.

Page 25: 2.Central Tendency and Dispersion (1)

Median for Raw Data Median for Raw Data Example - Even Sample Example - Even Sample SizeSizeDiameter of a shaft in millimeters in a manufacturing unit isDiameter of a shaft in millimeters in a manufacturing unit is

Given below for 10 samples. Calculate the median value.Given below for 10 samples. Calculate the median value.

  

2.502.50 2.452.45 2.552.55 2.602.60 2.462.46 2.432.43 2.562.56 2.582.58

2.662.66 2.652.65

  

Arranging the data in the ascending order, you will getArranging the data in the ascending order, you will get

  

2.432.43 2.452.45 2.462.46 2.502.50 2.552.55 2.562.56 2.582.58 2.602.60

2.652.65 2.662.66

  

The median falls between 5th and 6th observation. That isThe median falls between 5th and 6th observation. That is

between 2.55 and 2.56. Hence median = (2.55+2.56)/2 between 2.55 and 2.56. Hence median = (2.55+2.56)/2 =2.555=2.555

Page 26: 2.Central Tendency and Dispersion (1)

ModModee

Mode is that value which occurs most often. It has the Mode is that value which occurs most often. It has the maximum frequency of occurrence. Mode is not affected maximum frequency of occurrence. Mode is not affected by extreme values. by extreme values.

  

Mode is a very useful measure when you want to keep in Mode is a very useful measure when you want to keep in the inventory, the most popular shirt in terms of collar the inventory, the most popular shirt in terms of collar size during festival season. Median and mean will not be size during festival season. Median and mean will not be helpful in this type of situation. Another example where helpful in this type of situation. Another example where mode is the only answer is in determining the most mode is the only answer is in determining the most typical shoe size to be kept in stock in a shop selling typical shoe size to be kept in stock in a shop selling shoes. shoes.

  

Caution: In a few problems in real life, there will be more Caution: In a few problems in real life, there will be more than one mode such as bimodal and multi-modal values. than one mode such as bimodal and multi-modal values. In these cases mode cannot be uniquely determined.In these cases mode cannot be uniquely determined.

Page 27: 2.Central Tendency and Dispersion (1)

Mode for Raw Data Mode for Raw Data ExampleExample

The life in number of hours of 10 flashlight batteries The life in number of hours of 10 flashlight batteries are as follows: Find the mode.are as follows: Find the mode.

340340 350350 340340 340340 320320 340340 330330 330330

340340 350350

  

340 occurs five times. Hence, mode=340.340 occurs five times. Hence, mode=340.

Page 28: 2.Central Tendency and Dispersion (1)

Mean for Grouped DataMean for Grouped Data

Formula for Mean is given byFormula for Mean is given by

WhereWhere = Mean= Mean

= Sum of cross products of frequency in each = Sum of cross products of frequency in each class class with midpoint X of each class with midpoint X of each class

n n = Total number of observations (Total = Total number of observations (Total frequency) =frequency) =

n

fXX

X

fX

f

Page 29: 2.Central Tendency and Dispersion (1)

Mean for Grouped DataMean for Grouped DataExampleExample

Find the arithmetic mean for the Find the arithmetic mean for the following continuousfollowing continuous

frequency distribution:frequency distribution:  Class Class 0-1 0-1 1-2 1-2 2-3 2-3 3-4 3-4

4-5 4-5 5-6 5-6FrequencyFrequency 11 4 4 8 8 7 7 3 3 2 2

Page 30: 2.Central Tendency and Dispersion (1)

Solution for the Solution for the ExampleExample

Applying the formulaApplying the formula==75.5/25=3.0275.5/25=3.02

A B C D 1 Class X f fX 2 0-1 0.5 1 0.5 3 1-2 1.5 4 6.0 4 2-3 2.5 8 20.0 5 3-4 3.5 7 24.5 6 4-5 4.5 3 13.5 7 5-6 5.5 2 11.0 8 Totals 25 75.5 9 Mean 3.02

n

fXX

Page 31: 2.Central Tendency and Dispersion (1)

Mean by short cut methodMean by short cut method

• Where A is Assumed value ( one can Where A is Assumed value ( one can assume any value)assume any value)

• d is the deviation of each mid-value from A. d is the deviation of each mid-value from A. If d= ( X—A)/ c , then in the formula the If d= ( X—A)/ c , then in the formula the second term is multiplied by c. Where c is second term is multiplied by c. Where c is the class interval.the class interval.

n

fdX A

Assignment: find the mean using short-cut methodAssignment: find the mean using short-cut method

Page 32: 2.Central Tendency and Dispersion (1)

Example of short cut Example of short cut methodmethod• Table here Table here

presents the profit presents the profit of 1400 companies of 1400 companies .Find the mean .Find the mean using two different using two different methodsmethods

ProfitProfit No. of No. of cos.cos.

200-400200-400 500500

400-600400-600 300300

600-800600-800 280280

800-1000800-1000 120120

1000-1000-12001200

100100

1200-1200-14001400

8080

1400-1400-16001600

2020

TotalTotal 14001400

Page 33: 2.Central Tendency and Dispersion (1)

ProfitProfit (f)F(f)Freq. req.

(X)Mi(X)Mid d PointPoint

(f X)(f X) d=d=

(X-A) /c(X-A) /cf df d

200-400200-400 500500 300300 150,00150,0000

-3-3 -1500-1500

400-600400-600 300300 500500 150,00150,0000

-2-2 -600-600

600-800600-800 280280 700700 196,00196,0000

-1-1 -280-280

800-1000800-1000 120120 900900 108,00108,0000

00 00

1000-12001000-1200 100100 11001100 110,00110,0000

11 100100

1200-14001200-1400 8080 13001300 104,00104,0000

22 160160

1400-16001400-1600 2020 15001500 30,00030,000 33 6060

TotalTotal 14014000

848,00848,0000

00 -2060-2060

Page 34: 2.Central Tendency and Dispersion (1)

• Direct methodDirect method n

fXX

714.6051400/000,48,8X Short cut methodShort cut method

n

fdX A

= 900 +(- 2060)(200)/ 1400=900-294.28= 900 +(- 2060)(200)/ 1400=900-294.28= 605.714= 605.714

Page 35: 2.Central Tendency and Dispersion (1)

Properties of Mean Properties of Mean

• Sum of deviations from mean is Sum of deviations from mean is always zero.always zero.

• Sum of squared deviation from Mean Sum of squared deviation from Mean is Minimumis Minimum

• If X= X1 + X2, Then the Mean of X is If X= X1 + X2, Then the Mean of X is equal to the sum of means of X1 and equal to the sum of means of X1 and X2 (If the observations are equal)X2 (If the observations are equal)

• From two or more groups a “pooled” From two or more groups a “pooled” mean can be calculated mean can be calculated

Page 36: 2.Central Tendency and Dispersion (1)

Median for Median for Grouped Data Grouped Data

Formula for Median is given byFormula for Median is given by

  

Median =Median =

Where Where

L =Lower limit of the median classL =Lower limit of the median class

n = Total number of observations = n = Total number of observations =

m= Cumulative frequency preceding the median m= Cumulative frequency preceding the median classclass

f= Frequency of the median classf= Frequency of the median class

c= Class interval of the median classc= Class interval of the median class

cf

m(n/2)L

f

Page 37: 2.Central Tendency and Dispersion (1)

Median for Grouped Median for Grouped Data ExampleData Example

Find the median for the following continuousFind the median for the following continuous

frequency distribution:frequency distribution:

  

ClassClass 0-100-10 11-20 21-30 31-40 41-50 11-20 21-30 31-40 41-50

FrequencyFrequency 55 8 8 13 13 7 7 7 7

Page 38: 2.Central Tendency and Dispersion (1)

Solution for the Solution for the ExampleExample

Class Frequency Cumulative Frequency

0-10 5 5 11-20 8 1321-30 13 2631-40 7 3341-50 7 40Total 40Substituting in the formula the relevant values,

Median = ,we have Median =

= 21+(70/13)= 21+5.38 = 26.38

cf

m(n/2)L

10

13

13)2/40(21

Page 39: 2.Central Tendency and Dispersion (1)

Mode for Grouped Mode for Grouped DataData

Mode = Mode =

Where L =Lower limit of the modal classWhere L =Lower limit of the modal class

  

= Frequency of the modal class= Frequency of the modal class

= Frequency preceding the modal class= Frequency preceding the modal class

= Frequency succeeding the modal class= Frequency succeeding the modal class

C = Class Interval of the modal classC = Class Interval of the modal class

cdd

dL

21

1

011 ffd 212 ffd

1f

0f

2f

Page 40: 2.Central Tendency and Dispersion (1)

Mode for Grouped Mode for Grouped DataDataExampleExample

Example:Example: Find the mode for the followingFind the mode for the following

continuous frequency distribution:continuous frequency distribution:

  

ClassClass 0-10-1 1-21-2 2-32-3 3-43-4 4-54-5 5-65-6

FrequencyFrequency 11 44 88 77 33 22

Page 41: 2.Central Tendency and Dispersion (1)

Solution for the ExampleSolution for the Example

ClassClass Frequency Frequency

0-10-1 1 1

1-21-2 4 4

2-32-3 8 8

3-43-4 7 7

4-54-5 3 3

5-65-6 2 2

Total 25Total 25

Mode = Mode =

L = 2L = 2

= 8-4 = 4= 8-4 = 4

= 8-7 = 1= 8-7 = 1

C = 1C = 1 Hence Mode = Hence Mode =

= 2.8= 2.8

cdd

dL

21

1

011 ffd

212 ffd

15

42

Page 42: 2.Central Tendency and Dispersion (1)

Class assignmentClass assignment

• Find the Median and Mode for the following Find the Median and Mode for the following data ( salary structure of 1500 employees)data ( salary structure of 1500 employees)

• ( Answer Median= 33.46, Mode= 29.5)( Answer Median= 33.46, Mode= 29.5)

AgAgee

18-18-2222

22-22-2626

26-26-3030

30-30-3434

34-34-3838

38-38-4242

42-42-4646

46-46-5050

50-50-5454

54-54-5858

FreFreqq

121200

121255

282800

262600

151555

181844

161622

8686 7575 5353

Page 43: 2.Central Tendency and Dispersion (1)

Comparison of Comparison of Mean, Median, Mean, Median, ModeMode

MeanMean MedianMedian ModeModeDefined as the Defined as the arithmetic average of arithmetic average of all observations in the all observations in the data set.data set.

  

Requires measurement Requires measurement on all observations.on all observations.

  

Uniquely and Uniquely and comprehensively comprehensively defined.defined.

  

Defined as the Defined as the middle value in middle value in the data set the data set arranged in arranged in ascending or ascending or descending descending order.order.

  

Does not require Does not require measurement on measurement on all observationsall observations

  

Cannot be Cannot be determined determined under all under all conditions.conditions.

    

Defined as the most Defined as the most frequently occurring frequently occurring value in the value in the distribution; it has distribution; it has the largest the largest frequency.frequency.

  

Does not require Does not require measurement on all measurement on all observationsobservations

  

Not uniquely defined Not uniquely defined for multi-modal for multi-modal situations. situations.

  

  

Page 44: 2.Central Tendency and Dispersion (1)

Comparison of Comparison of Mean, Median, Mode Mean, Median, Mode Cont.Cont.

MeanMean MedianMedian ModeModeAffected by extreme Affected by extreme values. values.

  

Can be treated Can be treated algebraically. That is, algebraically. That is, Means of several Means of several groups can be groups can be combined.combined.

Not affected by Not affected by extreme values.extreme values.

  

Cannot be treated Cannot be treated algebraically. That algebraically. That is, Medians of is, Medians of several groups several groups cannot be cannot be combinedcombined. .

Not affected by Not affected by extreme values.extreme values.

  

Cannot be treated Cannot be treated algebraically. That algebraically. That is, Modes of several is, Modes of several groups cannot be groups cannot be combined.combined.

Page 45: 2.Central Tendency and Dispersion (1)

Which central tendency to Which central tendency to use…use…

• Type of data: Type of data: – If data is badly skewed: Avoid the MeanIf data is badly skewed: Avoid the Mean– If gaps in the data: Avoid medianIf gaps in the data: Avoid median– If uneven frequencies: Avoid ModeIf uneven frequencies: Avoid Mode

• Purpose of Analysis:Purpose of Analysis:– Representative value: MeanRepresentative value: Mean– Qualitative/ nominal variable: ModeQualitative/ nominal variable: Mode– Partition point: MedianPartition point: Median

Page 46: 2.Central Tendency and Dispersion (1)

Which central tendency to Which central tendency to use…use…

• Frequency distribution:Frequency distribution:– Open ended classes: Median or ModeOpen ended classes: Median or Mode– (except certain situations)(except certain situations)– Others : MeanOthers : Mean

• Nature of data:Nature of data:– Time series data: Avoid MeanTime series data: Avoid Mean– Ratios/rates : Avoid Mean Ratios/rates : Avoid Mean

Page 47: 2.Central Tendency and Dispersion (1)

RelationshipRelationship

• Mean, Median and mode are related Mean, Median and mode are related as follows:as follows:

• (Mean –Mode)= 3 ( Mean – Median)(Mean –Mode)= 3 ( Mean – Median)

• For a completely symmetric For a completely symmetric distribution, distribution,

( Normal distribution) , the three ( Normal distribution) , the three measures coincide with each other.measures coincide with each other.

Page 48: 2.Central Tendency and Dispersion (1)

Fractiles / QuantilesFractiles / Quantiles

• A FRACTILE is the value of an A FRACTILE is the value of an observation which is located at a observation which is located at a specified place in a series of data. For specified place in a series of data. For example : Median, which is located in example : Median, which is located in the middle.the middle.

• Various fractiles used are : Quartiles, Various fractiles used are : Quartiles, Deciles, Percentiles.Deciles, Percentiles.

• Median is 50Median is 50thth percentile or 5 percentile or 5thth decile or decile or 22ndnd quartile. quartile.

Page 49: 2.Central Tendency and Dispersion (1)

How to calculate fractile How to calculate fractile valuesvalues• Qn= P 25 n= Qn= P 25 n= c

f

m(nN/4)L

D n= P 10 n= D n= P 10 n= cf

m(nN/10)L

Page 50: 2.Central Tendency and Dispersion (1)

Class assignment: Calculate the fractiles Class assignment: Calculate the fractiles from the datafrom the data

AgAgee

18-18-2222

22-22-2626

26-26-3030

30-30-3434

34-34-3838

38-38-4242

42-42-4646

46-46-5050

50-50-5454

54-54-5858

FreFreqq

121200

121255

282800

262600

151555

181844

161622

8686 7575 5353

Page 51: 2.Central Tendency and Dispersion (1)

3) Measures of 3) Measures of DispersionDispersion

In simple terms, measures of dispersion In simple terms, measures of dispersion indicate how large the spread of the indicate how large the spread of the distribution is around the central tendency. distribution is around the central tendency. It answers unambiguously the question " It answers unambiguously the question " What is the magnitude of departure from the What is the magnitude of departure from the average value for different groups having average value for different groups having identical averages?". It is important to study identical averages?". It is important to study the central tendency along with dispersion the central tendency along with dispersion to throw light on the shape of the curve; to to throw light on the shape of the curve; to gauge whether there is distortion to the bell gauge whether there is distortion to the bell shaped symmetrical normal distribution shaped symmetrical normal distribution curve that forms the foundation stone upon curve that forms the foundation stone upon which the entire statistical inference is built.which the entire statistical inference is built.

Page 52: 2.Central Tendency and Dispersion (1)

RangeRange

RangeRange is the simplest of all measures of dispersion. is the simplest of all measures of dispersion. It is calculated as the difference between maximum It is calculated as the difference between maximum and minimum value in the data set.and minimum value in the data set.

Range = Range =

Example for Computing RangeExample for Computing Range  The following data represent the percentage return The following data represent the percentage return on investment for 10 mutual funds per annum. on investment for 10 mutual funds per annum. Calculate Range.Calculate Range.  12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 912, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9  Range = = 18-9=9Range = = 18-9=9

MinimumMaximum XX

MinimumMaximum XX

Page 53: 2.Central Tendency and Dispersion (1)

• Range is an absolute measure and is Range is an absolute measure and is defined for a particular data set. It can not defined for a particular data set. It can not be used for comparison of two data sets.be used for comparison of two data sets.

• Coefficient of Range is an absolute Coefficient of Range is an absolute measure measure

• Coefft. Of Range= ( L—S)/ ( L+ S)Coefft. Of Range= ( L—S)/ ( L+ S)• If it’s a small value, dispersion is lessIf it’s a small value, dispersion is less• Coeftt. Of Range is not a consistent Coeftt. Of Range is not a consistent

measure, thus it is not used always.measure, thus it is not used always.– Example : two samples : first with extreme Example : two samples : first with extreme

values as 1 and 2 , the second sample having values as 1 and 2 , the second sample having extreme values as extreme values as

11 and 12. these samples’ coefficient of 11 and 12. these samples’ coefficient of range will be First sample :(2-1)/(2+1)= 1/3 , range will be First sample :(2-1)/(2+1)= 1/3 , second sample:( 12-11)/ (12+11)=1/23second sample:( 12-11)/ (12+11)=1/23

Page 54: 2.Central Tendency and Dispersion (1)

RangeRange

Caution: Caution: Range is a good measure of spread in Range is a good measure of spread in the distribution only when a data set shows a the distribution only when a data set shows a stable pattern of variation without extreme stable pattern of variation without extreme values. If one of the components of range namely values. If one of the components of range namely the maximum value or minimum value becomes the maximum value or minimum value becomes an extreme value, then range should not be used. an extreme value, then range should not be used.

Page 55: 2.Central Tendency and Dispersion (1)

Interquartile Interquartile RangeRange

Range is entirely dependent on maximum Range is entirely dependent on maximum and minimum values in the data set and is and minimum values in the data set and is highly misleading when one of them is an highly misleading when one of them is an extreme value. To overcome this deficiency, extreme value. To overcome this deficiency, you can resort to interquartile range. It is you can resort to interquartile range. It is computed as the range after eliminating the computed as the range after eliminating the highest and lowest 25% of observations in a highest and lowest 25% of observations in a data set that is arranged in ascending order. data set that is arranged in ascending order. Thus this measure is not sensitive to Thus this measure is not sensitive to extreme values.extreme values.  Interquartile range = Range computed on Interquartile range = Range computed on middle 50% of the observationsmiddle 50% of the observations

Page 56: 2.Central Tendency and Dispersion (1)

Interquartile Range-Interquartile Range-ExampleExampleThe following data represent the percentage The following data represent the percentage

return on investment for 9 mutual funds per return on investment for 9 mutual funds per annum. Calculate interquartile range.annum. Calculate interquartile range.

Data Set: 12, 14, 11, 18, 10.5, 12, 14, 11, 9Data Set: 12, 14, 11, 18, 10.5, 12, 14, 11, 9Arranging in ascending order, the data set Arranging in ascending order, the data set becomesbecomes

9, 10.5, 11, 11, 12, 12, 14, 14, 189, 10.5, 11, 11, 12, 12, 14, 14, 18  Ignore the first two (9, 10.5) and last two (14, Ignore the first two (9, 10.5) and last two (14, 18) observations in this data set. The remaining 18) observations in this data set. The remaining contains 50% of the data. They are 11, 11, 12, contains 50% of the data. They are 11, 11, 12, 12, 14, and 14. For this if you calculate range, 12, 14, and 14. For this if you calculate range, you get interquartile range.you get interquartile range.  Interquartile range = 14-11 =3. Interquartile range = 14-11 =3.

Page 57: 2.Central Tendency and Dispersion (1)

Quartile DeviationQuartile Deviation

• Quartile deviation= IQR/2Quartile deviation= IQR/2

• This is an absolute measure of This is an absolute measure of dispersion, not to be used for dispersion, not to be used for comparisoncomparison

• For comparison we use “ Coefficient For comparison we use “ Coefficient of Quartile deviation”of Quartile deviation”

• Coefft. Of QD = ( Q3—Q1)/( Q3 +Q1)Coefft. Of QD = ( Q3—Q1)/( Q3 +Q1)

Page 58: 2.Central Tendency and Dispersion (1)

Mean Absolute Mean Absolute Deviation(MAD)Deviation(MAD)

Mean Absolute Deviation (MAD) is defined as the average based on Mean Absolute Deviation (MAD) is defined as the average based on the deviations measured from arithmetic mean, in which all the deviations measured from arithmetic mean, in which all deviations are treated as positive ignoring the actual sign. Unlike deviations are treated as positive ignoring the actual sign. Unlike range, MAD is based on all observations. Hence it reflects the range, MAD is based on all observations. Hence it reflects the dispersion of every item in the distribution. In symbolic form, it is dispersion of every item in the distribution. In symbolic form, it is defined by the following formula.defined by the following formula.  MAD = MAD =

WhereWhere

represents sum of all deviations from arithmetic mean represents sum of all deviations from arithmetic mean after after ignoring signignoring sign

= Arithmetic Mean = Arithmetic Mean n = Number of observations in the sample(sample size)n = Number of observations in the sample(sample size)

  Caution: Mean Absolute Deviation (MAD) has two weaknesses. 1) It Caution: Mean Absolute Deviation (MAD) has two weaknesses. 1) It cannot be combined for several groups. 2) Ignoring the sign has cannot be combined for several groups. 2) Ignoring the sign has serious implications to a business manager attempting to measure serious implications to a business manager attempting to measure the spread of the distribution in a scientific manner.the spread of the distribution in a scientific manner.  

n

XX

XX

X

Page 59: 2.Central Tendency and Dispersion (1)

Example for MADExample for MAD

The following data represent the percentage return on The following data represent the percentage return on investment for 10 mutual funds per annum. Calculate investment for 10 mutual funds per annum. Calculate MAD (Please note that this is the same example used for MAD (Please note that this is the same example used for computing Rangecomputing Range))

  12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 912, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9

= = (12+14+11+18+10.5+11.3+12+14+11+9)/10 (12+14+11+18+10.5+11.3+12+14+11+9)/10

=12.28 =12.28

== ++ ++ ++

++ + + + + + +

+ + ++ = = 18.3218.32

MAD =MAD = = = 18.32/10 =1.83218.32/10 =1.832

n

XX

XX 28.1212 28.1214 28.1211 28.1218

28.125.10 28.123.11 28.1212 28.1214

28.1211 28.129

n

XX

Page 60: 2.Central Tendency and Dispersion (1)

Standard Standard DeviationDeviation

Standard deviation forms the basis for the discussion on Standard deviation forms the basis for the discussion on Inferential Statistics. It is a classic measure of dispersion. Inferential Statistics. It is a classic measure of dispersion. It has many advantages over the rest of the measures of It has many advantages over the rest of the measures of variations. It is based on all observations. It is capable of variations. It is based on all observations. It is capable of being algebraically treated which implies that you can being algebraically treated which implies that you can combine standard deviations of many groups. It plays a combine standard deviations of many groups. It plays a very vital role in testing hypotheses and forming very vital role in testing hypotheses and forming confidence interval. confidence interval.

To define standard deviation, you need to define another To define standard deviation, you need to define another term called variance. In simple terms, standard deviation term called variance. In simple terms, standard deviation is the square root of variance. is the square root of variance.

Page 61: 2.Central Tendency and Dispersion (1)

Important Terms with Important Terms with NotationsNotations

Im p o rta n t T e rm s w ith n o ta t io n s

K e y R e m a rk s

S a m p le V a r ia n c e 1

)(2

2

n

XXS

S a m p le S t a n d a rd D e v ia t io n

S =1

)(2

n

XX

P o p u la t io n V a r ia n c e

2=

N

X 2)(

P o p u la t io n S t a n d a rd D e v ia t io n

N

X 2)(

W h e re n

XX (S a m p le M e a n ) a n d

N

X (P o p u la t io n M e a n )

n = N u m b e r o f o b se rv a t io n s in t h e sa m p le (S a m p le s iz e ) N = N u m b e r o f o b se rv a t io n s in t h e P o p u la t io n (P o p u la t io n S iz e )

1 . 1

)(2

2

n

XXS is a n u n b ia se d

e s t im a to r o f 2=

N

X 2)(

2 . n

XX is a n u n b ia se d

e s t im a to r o f N

X

3 . T h e d iv iso r n -1 is a lw a ys u se d w h ile c a lc u la t in g sa m p le v a r ia n c e fo r e n su r in g p ro p e r ty o f b e in g u n b ia se d

4 . S t a n d a rd d e v ia t io n is a lw a ys th e

sq u a re ro o t o f v a r ia n c e

Page 62: 2.Central Tendency and Dispersion (1)

Example for Standard Example for Standard DeviationDeviation

The following data represent the percentage The following data represent the percentage return on investment for 10 mutual funds return on investment for 10 mutual funds per annum. Calculate the sample standard per annum. Calculate the sample standard deviation.deviation.

  

12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 912, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9

Page 63: 2.Central Tendency and Dispersion (1)

Solution for the Solution for the Example for the Example for the

ExampleExample

Page 64: 2.Central Tendency and Dispersion (1)

Solution for the Solution for the Example Cont.Example Cont.

From the spreadsheet of Microsoft Excel in the previous From the spreadsheet of Microsoft Excel in the previous slide, it is easy to see slide, it is easy to see

that Mean = that Mean = =12.28 (In column A and row14, =12.28 (In column A and row14, 12.28 is seen).12.28 is seen).

Sample Variance = =6.33 (In column D and Sample Variance = =6.33 (In column D and row 14, 6.33 is seen)row 14, 6.33 is seen)

Sample Standard Deviation = S = = Sample Standard Deviation = S = = 2.522.52

(In column D and row 15, 2.52 is seen)(In column D and row 15, 2.52 is seen)

n

XX

1n

)X(XS

22

1n

)X(X 2

Page 65: 2.Central Tendency and Dispersion (1)

Standard Deviation Standard Deviation for Grouped Datafor Grouped Data

The standard deviation for sample data, based on The standard deviation for sample data, based on frequency distribution is given byfrequency distribution is given by

  

S = which is used to estimate theS = which is used to estimate the

Population Standard Deviation .Population Standard Deviation .

  

Here Here

n is the Sample Size = , X =Mid Point of each n is the Sample Size = , X =Mid Point of each classclass

22

][n

d

n

d

n

fXX

f

Page 66: 2.Central Tendency and Dispersion (1)

Standard Deviation for Standard Deviation for Grouped Data-Grouped Data-ExampleExample

Frequency Distribution of Return on Investment Frequency Distribution of Return on Investment of Mutual Fundsof Mutual Funds

Return on Return on InvestmentInvestment

Number of Number of Mutual FundsMutual Funds

5-105-10

10-1510-15

15-2015-20

20-2520-25

25-3025-30

TotalTotal

1010

1212

1616

1414

88

6060

Page 67: 2.Central Tendency and Dispersion (1)

Solution for the Solution for the ExampleExample

Page 68: 2.Central Tendency and Dispersion (1)

Solution for the Solution for the ExampleExample

From the spreadsheet of Microsoft Excel in the From the spreadsheet of Microsoft Excel in the previousprevious

slide, it is easy to seeslide, it is easy to see

Mean = =1040/60=17.333(cell Mean = =1040/60=17.333(cell F10), F10),

Standard Deviation = S = = Standard Deviation = S = = = 6.44 = 6.44

(Cell H12) (Cell H12)

n

fXX

X2

n

X2

59

2448.33

Page 69: 2.Central Tendency and Dispersion (1)

Calculation of SD : Raw Calculation of SD : Raw datadata• First the direct First the direct

methodmethod

• Without deviation Without deviation methodmethod

• Assumed mean Assumed mean

methodmethod

X2

n

X2

22

][n

d

n

d

1n

)X(X 2

Page 70: 2.Central Tendency and Dispersion (1)

Calculation of SD : Grouped Calculation of SD : Grouped

• First the direct First the direct methodmethod

• Without deviation Without deviation methodmethod

• Assumed mean Assumed mean methodmethod

1n

)Xf(X 2

X2

n

fX2

22

][n

fd

n

fd

Page 71: 2.Central Tendency and Dispersion (1)

Class assignmentClass assignment

• Find the average Find the average deviation and deviation and standard deviation standard deviation of the following of the following data:data:

Sales Sales No. of shopsNo. of shops

10-2010-20 33

20-3020-30 66

30-4030-40 1111

40-5040-50 33

50-6050-60 22

Page 72: 2.Central Tendency and Dispersion (1)

Solution: Mean= 825 / 25 = 33Solution: Mean= 825 / 25 = 33

SalesSales ff XX fxfx (X-33)(X-33) f (X-33)f (X-33) SqrSqr F(sqrF(sqr))

10-10-2020

33 1515 4545 1818 5454 324324 972972

20-20-3030

66 2525 150150 88 4848 6464 384384

30-30-4040

1111 3535 385385 22 2222 44 4444

40-40-5050

33 4545 135135 1212 3636 144144 432432

50-50-6060

22 5555 110110 2222 4444 484484 968968

TotalTotal 2525 825825 204204 28002800AD= 204/ 25=8.16, Variance= 2800 / 25=122, SD= 10.58

Page 73: 2.Central Tendency and Dispersion (1)

Class Assignment :SD from Assumed meanClass Assignment :SD from Assumed mean

• Use the above Use the above method to find the method to find the SD of the following SD of the following data of 79 studentsdata of 79 students

MarksMarks No. of No. of studentsstudents

0-100-10 1818

10-2010-20 1616

20-3020-30 1515

30-4030-40 1212

40-5040-50 1010

50-6050-60 55

60-7060-70 22

70-8070-80 11

Page 74: 2.Central Tendency and Dispersion (1)

deviation d= (X—A)/ c , A= 25deviation d= (X—A)/ c , A= 25

ClassClass XX ff fxfx X^2X^2 fX^2fX^2 dd fdfd d^2d^2 fd^2fd^2

0-100-10 55 1818 9090 2525 450450 -2-2 -36-36 44 7272

10-2010-20 1515 1616 240240 225225 36003600 -1-1 -16-16 11 1616

20-3020-30 2525 1515 375375 625625 93759375 00 00 00 00

30-4030-40 3535 1212 420420 12251225 1470147000

11 1212 11 1212

40-5040-50 4545 1010 450450 20252025 2025202500

22 2020 44 4040

50-6050-60 5555 55 275275 30253025 1512151255

33 1515 99 4545

60-7060-70 6565 22 130130 42254225 84508450 44 88 1616 3232

70-8070-80 7575 11 7575 56255625 56255625 55 55 2525 2525

TotalTotal 7979 20552055 1700170000

7757775755

88 242242

Page 75: 2.Central Tendency and Dispersion (1)

• Deviation methodDeviation method

• SD= 10 [ (242/ 79)—( 8/79)(8/79)]^ 1/2SD= 10 [ (242/ 79)—( 8/79)(8/79)]^ 1/2

• SD = 10 ( 1.75)= 17.5SD = 10 ( 1.75)= 17.5

• Direct method Direct method

• V= [ ( 77575/ 79)—(2055/79)( 2055/79)]V= [ ( 77575/ 79)—(2055/79)( 2055/79)]

• SD= {981.96—676.75}^1/2= { 303.3}^1/2SD= {981.96—676.75}^1/2= { 303.3}^1/2

• =17.47=17.47

Page 76: 2.Central Tendency and Dispersion (1)

nsobservatio the of 50% covers Q.D. X

ondistributi symmetric a In

D A.

Q.D.

dispersion of measuresother and S.D between ipRelationsh

5432

nsobservatio the of % 68.26 covers S.D

nsobservatio the of 57.5% covers A.D

X

X

Page 77: 2.Central Tendency and Dispersion (1)

Coefficient of VariationCoefficient of Variation(Relative Dispersion)(Relative Dispersion)

Coefficient of Variation (CV) is defined as the ratio of Coefficient of Variation (CV) is defined as the ratio of Standard Deviation to Mean. Standard Deviation to Mean. In symbolic formIn symbolic form

CV= for the sample data and = for the CV= for the sample data and = for the population data. population data.

CV is the measure to use when you want to see the CV is the measure to use when you want to see the relative spread across groups or segments. It also relative spread across groups or segments. It also measures the extent of spread in a distribution as a measures the extent of spread in a distribution as a percentage to the mean. Larger the CV, greater is percentage to the mean. Larger the CV, greater is the percentage spread. As a manager, you would like the percentage spread. As a manager, you would like to have a small CV so that your assessment in a to have a small CV so that your assessment in a situation is robust. The percentage risk is minimized.situation is robust. The percentage risk is minimized.

X

σ

Page 78: 2.Central Tendency and Dispersion (1)

Coefficient of Coefficient of VariationVariationExampleExample

Consider two Sales Persons working in the same Consider two Sales Persons working in the same territory. The sales performance of these two in territory. The sales performance of these two in the context of selling PCs are given below. the context of selling PCs are given below. Comment on the results. Comment on the results.

   Sales Person 1 Sales Person 2 Mean Sales (One year average) 50 units Standard Deviation 5 units

Mean Sales (One year average)75 units Standard deviation 25 units

Page 79: 2.Central Tendency and Dispersion (1)

Interpretation for the Interpretation for the Example Example

The CV is 5/50 =0.10 or 10% for the Sales Person1 The CV is 5/50 =0.10 or 10% for the Sales Person1 and 25/75=0.33 or 33% for sales Person2. It and 25/75=0.33 or 33% for sales Person2. It seems Sales Person1 performs better than Sales seems Sales Person1 performs better than Sales Person2 with less relative dispersion or scattering. Person2 with less relative dispersion or scattering. Sales Person2 has a very high departure or Sales Person2 has a very high departure or standard deviation from his average sales standard deviation from his average sales achievement. The moral of the story is "don't get achievement. The moral of the story is "don't get carried away by absolute number". Look at the carried away by absolute number". Look at the scatter. Even though, Sales Person2 has achieved scatter. Even though, Sales Person2 has achieved a higher average, his performance is not a higher average, his performance is not consistent and seems erratic.consistent and seems erratic.

Page 80: 2.Central Tendency and Dispersion (1)

Example:Coefficient of VariationExample:Coefficient of Variation

• Since Mean and variance are Since Mean and variance are enough to compare two groups of enough to compare two groups of data CV is used to measure the data CV is used to measure the relative spread of the datarelative spread of the data

• Two factories which have 50 and Two factories which have 50 and 100 employees have the average 100 employees have the average wages as Rs.120 per day and Rs. wages as Rs.120 per day and Rs. 85 per day. The variance of wages 85 per day. The variance of wages in the two factories are 9 and 16 in the two factories are 9 and 16 respectively. Find which factory has respectively. Find which factory has more uniformity in wages? more uniformity in wages?

Page 81: 2.Central Tendency and Dispersion (1)

• CV for factory A = 3/120x 100= 2.5CV for factory A = 3/120x 100= 2.5

• CV for factory B= 4/85x 100= 4.7CV for factory B= 4/85x 100= 4.7

• Factory A has more uniform wagesFactory A has more uniform wages

Page 82: 2.Central Tendency and Dispersion (1)

SkewnessSkewness– Measure of asymmetry of a frequency Measure of asymmetry of a frequency

distributiondistribution•Skewed to leftSkewed to left•Symmetric or unskewedSymmetric or unskewed•Skewed to rightSkewed to right

KurtosisKurtosis– Measure of flatness or peakedness of a Measure of flatness or peakedness of a

frequency distributionfrequency distribution•PlatykurticPlatykurtic (relatively flat) (relatively flat)•MesokurticMesokurtic (normal) (normal)•LeptokurticLeptokurtic (relatively peaked) (relatively peaked)

Skewness and KurtosisSkewness and Kurtosis

Page 83: 2.Central Tendency and Dispersion (1)

Skewed to left

SkewnessSkewness

Page 84: 2.Central Tendency and Dispersion (1)

SkewnessSkewness

Symmetric

Page 85: 2.Central Tendency and Dispersion (1)

SkewnessSkewness

Skewed to right

Page 86: 2.Central Tendency and Dispersion (1)

KurtosisKurtosis

Platykurtic - flat distribution

Page 87: 2.Central Tendency and Dispersion (1)

KurtosisKurtosis

Mesokurtic - not too flat and not too peaked

Page 88: 2.Central Tendency and Dispersion (1)

KurtosisKurtosis

Leptokurtic - peaked distribution

Page 89: 2.Central Tendency and Dispersion (1)

SkewnessSkewness• (i) Mean-Mode/S.D(i) Mean-Mode/S.D

• (ii) 3(Mean-Median)/S.D(ii) 3(Mean-Median)/S.D

• (iii) Bowley’s : (iii) Bowley’s : – BS= (Q3+Q1-2 Median)/(Q3-A1)BS= (Q3+Q1-2 Median)/(Q3-A1)

• Kelley’s:Kelley’s:

• KS= P50-(P10+P90)/2KS= P50-(P10+P90)/2

• BASED ON MOMENTS BASED ON MOMENTS

• BETA1= (Mu3)^2/ (Mu2)^3BETA1= (Mu3)^2/ (Mu2)^3

Page 90: 2.Central Tendency and Dispersion (1)

KurtosisKurtosis• Kurtosis is measured by Beta2Kurtosis is measured by Beta2

• Beta2= (Mu4)/ (Mu2)^2Beta2= (Mu4)/ (Mu2)^2

• Where Mu2= (1/N) Sum(X-mean)^2Where Mu2= (1/N) Sum(X-mean)^2– And Mu4= (1/N) Sum (X-Mean)^4And Mu4= (1/N) Sum (X-Mean)^4

Page 91: 2.Central Tendency and Dispersion (1)

KurtosisKurtosis

• PlatyKurtic : FlatPlatyKurtic : Flat

• Mesokurtic: NormalMesokurtic: Normal

• Leptokurtic: Very highLeptokurtic: Very high

• Beta2= Mu4/(Mu2)^2Beta2= Mu4/(Mu2)^2

• Where Mu4= 1/n( Sum fd^4)Where Mu4= 1/n( Sum fd^4)

• and Mu2= 1/n( Sum fd^2)and Mu2= 1/n( Sum fd^2)

Page 92: 2.Central Tendency and Dispersion (1)

Chebyshev’s TheoremChebyshev’s TheoremApplies to Applies to any any distribution, regardless of distribution, regardless of

shapeshapePlaces lower limits on the percentages of Places lower limits on the percentages of

observations within a given number of observations within a given number of standard deviations from the meanstandard deviations from the mean

Empirical RuleEmpirical RuleApplies only to roughly Applies only to roughly mound-shapedmound-shaped and and

symmetricsymmetric distributions distributionsSpecifies approximate percentages of Specifies approximate percentages of

observations within a given number of observations within a given number of standard deviations from the mean standard deviations from the mean

Relations between the Mean Relations between the Mean and Standard Deviationand Standard Deviation

Page 93: 2.Central Tendency and Dispersion (1)

11

21

14

34

75%

11

31

19

89

89%

11

41

116

1516

94%

2

2

2

At least of the elements of At least of the elements of anyany distribution lie within distribution lie within kk standard deviations of the meanstandard deviations of the mean

At least

Lie within

Standarddeviationsof the mean

2

3

4

Chebyshev’s TheoremChebyshev’s Theorem

2

11k

Page 94: 2.Central Tendency and Dispersion (1)

For roughly mound-shaped and symmetric distributions, approximately:

68% 1 standard deviation of the mean

95% Lie within

2 standard deviations of the mean

All 3 standard deviations of the mean

Empirical RuleEmpirical Rule

Page 95: 2.Central Tendency and Dispersion (1)

Pie ChartsPie ChartsCategories represented as percentages of totalCategories represented as percentages of total

Bar GraphsBar GraphsHeights of rectangles represent group Heights of rectangles represent group

frequenciesfrequencies Frequency PolygonsFrequency Polygons

Height of line represents frequency Height of line represents frequency OgivesOgives

Height of line represents cumulative frequencyHeight of line represents cumulative frequency Time PlotsTime Plots

Represents values over timeRepresents values over time

1-8 Methods of Displaying 1-8 Methods of Displaying DataData

Page 96: 2.Central Tendency and Dispersion (1)

Pie ChartPie Chart

Page 97: 2.Central Tendency and Dispersion (1)

Bar ChartBar Chart

Average Revenues

Average Expenses

Fig. 1-11 Airline Operating Expenses and Revenues

1 2

1 0

8

6

4

2

0

A i r li n e

American Continental Delta Northwest Southwest United USAir

Page 98: 2.Central Tendency and Dispersion (1)

Relative Frequency Polygon Ogive

Frequency Polygon and Frequency Polygon and OgiveOgive

50403020100

0.3

0.2

0.1

0.0

Re

lativ

e F

req

ue

ncy

Sales50403020100

1.0

0.5

0.0

Cu

mu

lativ

e R

ela

tive

Fre

qu

en

cySales

Page 99: 2.Central Tendency and Dispersion (1)

OSAJJMAMFJDNOSAJJMAMFJDNOSAJJMAMFJ

8.5

7.5

6.5

5.5

Month

Mill

ions

of T

ons

M o nthly S te e l P ro d uc tio n

(P ro b le m 1 -4 6 )

Time PlotTime Plot

Page 100: 2.Central Tendency and Dispersion (1)

Stem-and-Leaf DisplaysStem-and-Leaf Displays Quick-and-dirty listing of all observations Quick-and-dirty listing of all observations Conveys some of the same information as a histogramConveys some of the same information as a histogram

Box PlotsBox Plots MedianMedian Lower and upper quartilesLower and upper quartiles Maximum and minimumMaximum and minimum

Techniques to determine relationships and trends, identify outliers and influential observations, and quickly describe or summarize data sets.

Techniques to determine relationships and trends, identify outliers and influential observations, and quickly describe or summarize data sets.

1-9 Exploratory Data 1-9 Exploratory Data Analysis - EDAAnalysis - EDA

Page 101: 2.Central Tendency and Dispersion (1)

1 122355567 2 0111222346777899 3 012457 4 11257 5 0236 6 02

1 122355567 2 0111222346777899 3 012457 4 11257 5 0236 6 02

Example 1-8: Stem-and-Leaf Example 1-8: Stem-and-Leaf DisplayDisplay

Page 102: 2.Central Tendency and Dispersion (1)

X X *o

MedianQ1 Q3InnerFence

InnerFence

OuterFence

OuterFence

Interquartile Range

Smallest data point not below inner fence

Largest data point not exceeding inner fence

Suspected outlierOutlier

Q1-3(IQR)Q1-1.5(IQR) Q3+1.5(IQR)

Q3+3(IQR)

Elements of a Box PlotElements of a Box Plot

Box PlotBox Plot

Page 103: 2.Central Tendency and Dispersion (1)

Example: Box Plot Example: Box Plot

Page 104: 2.Central Tendency and Dispersion (1)

1-10 Using the Computer – 1-10 Using the Computer – The Template OutputThe Template Output

Page 105: 2.Central Tendency and Dispersion (1)

Using the Computer – Using the Computer – Template Output for the Template Output for the HistogramHistogram

Page 106: 2.Central Tendency and Dispersion (1)

Using the Computer – Using the Computer – Template Output for Template Output for Histograms for Grouped Histograms for Grouped DataData

Page 107: 2.Central Tendency and Dispersion (1)

Using the Computer – Template Output for Using the Computer – Template Output for Frequency Polygons & the Ogive for Grouped Frequency Polygons & the Ogive for Grouped DataData

Page 108: 2.Central Tendency and Dispersion (1)

Using the Computer – Template Output Using the Computer – Template Output for Two Frequency Polygons for Grouped for Two Frequency Polygons for Grouped DataData

Page 109: 2.Central Tendency and Dispersion (1)

Using the Computer – Pie Using the Computer – Pie Chart Template OutputChart Template Output

Page 110: 2.Central Tendency and Dispersion (1)

Using the Computer – Bar Using the Computer – Bar Chart Template OutputChart Template Output

Page 111: 2.Central Tendency and Dispersion (1)

Using the Computer – Box Using the Computer – Box Plot Template OutputPlot Template Output

Page 112: 2.Central Tendency and Dispersion (1)

Using the Computer – Box Using the Computer – Box Plot Template to Compare Plot Template to Compare Two Data SetsTwo Data Sets

Page 113: 2.Central Tendency and Dispersion (1)

Using the Computer – Using the Computer – Time Plot Template Time Plot Template

Page 114: 2.Central Tendency and Dispersion (1)

Using the Computer – Using the Computer – Time Plot Comparison Time Plot Comparison Template Template


Recommended