COMPUTER ORIENTEDSTATISTICAL TECHNIQUES(As per the New Syllabus of Mumbai University for B.Sc. (Information Technology),
Semester IV, 2017-18)
Dr. Dinesh GabhanePh.D. (Mgmt.), M.Phil. (Commerce), MBA (Mktg. & HR), UGC-NET (Mgmt.), B.E. (Production)
Associate Professor, Rajeev Gandhi College of Management Studies,Navi Mumbai.
Ms. Madhuri S. BankarM.Sc. (C/S), MCA, PGDCS&A
Head, Department of Information Technology,K.B.P College,
Vashi, Navi Mumbai.
ISO 9001:2008 CERTIFIED
© AuthorsNo part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by anymeans, electronic, mechanical, photocopying, recording and/or otherwise without the prior written permission of theauthors and the publisher.
First Edition : 2018
Published by : Mrs. Meena Pandey for Himalaya Publishing House Pvt. Ltd.,“Ramdoot”, Dr. Bhalerao Marg, Girgaon, Mumbai - 400 004.Phone: 022-23860170, 23863863; Fax: 022-23877178E-mail: [email protected]; Website: www.himpub.com
Branch Offices :
New Delhi : “Pooja Apartments”, 4-B, Murari Lal Street, Ansari Road, Darya Ganj, New Delhi - 110 002.Phone: 011-23270392, 23278631; Fax: 011-23256286
Nagpur : Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440 018.Phone: 0712-2738731, 3296733; Telefax: 0712-2721216
Bengaluru : Plot No. 91-33, 2nd Main Road, Seshadripuram, Behind Nataraja Theatre,Bengaluru - 560 020. Phone: 080-41138821; Mobile: 09379847017, 09379847005
Hyderabad : No. 3-4-184, Lingampally, Besides Raghavendra Swamy Matham, Kachiguda,Hyderabad - 500 027. Phone: 040-27560041, 27550139
Chennai : New No. 48/2, Old No. 28/2, Ground Floor, Sarangapani Street, T. Nagar,Chennai - 600 012. Mobile: 09380460419
Pune : “Laksha” Apartment, First Floor, No. 527, Mehunpura, Shaniwarpeth (Near Prabhat Theatre),Pune - 411 030. Phone: 020-24496323, 24496333; Mobile: 09370579333
Lucknow : House No. 731, Shekhupura Colony, Near B.D. Convent School, Aliganj,Lucknow - 226 022. Phone: 0522-4012353; Mobile: 09307501549
Ahmedabad : 114, “SHAIL”, 1st Floor, Opp. Madhu Sudan House, C.G. Road, Navrang Pura,Ahmedabad - 380 009. Phone: 079-26560126; Mobile: 09377088847
Ernakulam : 39/176 (New No. 60/251), 1st Floor, Karikkamuri Road, Ernakulam, Kochi - 682 011.Phone: 0484-2378012, 2378016; Mobile: 09387122121
Bhubaneswar : Plot No. 214/1342, Budheswari Colony, Behind Durga Mandap, Bhubaneswar - 751 006.Phone: 0674-2575129; Mobile: 09338746007
Kolkata : 108/4, Beliaghata Main Road, Near ID Hospital, Opp. SBI Bank, Kolkata - 700 010.Phone: 033-32449649; Mobile: 07439040301
DTP by : Bhakti S. Gaonkar
Printed at : M/s. Aditya Offset Process (I) Pvt. Ltd., Hyderabad. On behalf of HPH.
Dedication
I would like to dedicate this book to my mother and father.
I would like to thank my wife and son for continuous support.
I would like to extend my gratitude to all my friends and colleagues forencouraging me in writing this book.
Dr. Dinesh Gabhane
I would like to dedicate this book to my family for allowing me the time towrite it.
I would like to thank my husband Dr. Dinesh Gabhane for standing besideme throughout my work. He is a source of inspiration and motivation forcontinuing to improve my knowledge and move my career forward.
Special thanks go to my dear son Vedant who gave me energy.
Last but not least my college principal Dr. V.S. Shivankar, all my friends andcolleagues who encouraged me all time.
Ms. Madhuri S. Bankar
Preface
It gives us immense pleasure to present the first edition of book “Computer Oriented
Statistical Techniques” to the teachers and students of Semester-IV of S.Y.B.Sc. (InformationTechnology). This book has been written as per the syllabus prescribed by University of
Mumbai with effect from academic year 2017-18.
The whole syllabus is divided in to V units and XIII chapters. In each chapter, the conceptand theory is followed by sufficient number of solved examples. We have tried our level best
to present the subject matter in simple language for better understanding of the readers. Wehope that this edition will meet the requirements of the students of S.Y.B.Sc. (IT) in their
examination preparation.
Constructive suggestions and comments from the readers will be sincerely appreciated.
We would be glad to hear from you, if you would like to suggest improvements or tocontribute in any way. Kindly send your correspondence to [email protected] or
Finally, we would like to acknowledge our sincere respect and gratitude to Kiran Gurbani(Head of Computer Science and IT Department, R.K. Talreja College, Ulhasnagar) for reviewing
this book thoroughly and providing an environment which stimulates new thinking andinnovations and her support which helped us for bringing out this book in time.
We are thankful to Mr. S.K. Srivastava for giving us an opportunity and encouragement to
write this book. We also extend our thanks to the staff of Himalaya Publishing House Pvt. Ltd.for assisting us in proof reading and compilation of this book.
Dr. Dinesh Gabhane
Ms. Madhuri Bankar
Syllabus
Computer Oriented Statistical Techniques
Sr. No. Modules/Units Lectures
Unit I
The Mean, Median, Mode and Other Measures of Central Tendency:Index, or Subscript, Notation, Summation Notation, Averages, or Measuresof Central Tendency, The Arithmetic Mean, The Weighted ArithmeticMean, Properties of the Arithmetic Mean, The Arithmetic Mean Computedfrom Grouped Data, The Median, The Mode, The Empirical Relationbetween the Mean, Median, and Mode, The Geometric Mean G, TheHarmonic Mean H, The Relation between the Arithmetic, Geometric andHarmonic Means, The Root Mean Square, Quartiles, Deciles, andPercentiles, Software and Measures of Central Tendency.The Standard Deviation and Other Measures of Dispersion: Dispersionor Variation, The Range, The Mean Deviation, The Semi-InterquartileRange, The 10-90 Percentile Range, The Standard Deviation, TheVariance, Short Methods for Computing the Standard Deviation, Propertiesof the Standard Deviation, Charlie’s Check, Sheppard’s Correction forVariance, Empirical Relations between Measures of Dispersion, Absoluteand Relative Dispersion; Coefficient of Variation, Standardized Variable;Standard Scores, Software and Measures of Dispersion.Introduction to R: Basic Syntax, Data Types, Variables, Operators,Control Statements, R-functions, R-vectors, R-lists, R-arrays.
12
Unit II
Moments, Skewness and Kurtosis: Moments, Moments for GroupedData, Relations Between Moments, Computation of Moments for GroupedData, Charlie’s Check and Sheppard’s Corrections, Moments inDimensionless Form, Skewness, Kurtosis, Population Moments, Skewness,and Kurtosis, Software Computation of Skewness and Kurtosis.Elementary Probability Theory: Definitions of Probability, ConditionalProbability; Independent and Dependent Events, Mutually ExclusiveEvents, Probability Distributions, Mathematical Expectation, Relationbetween Population, Sample Mean, and Variance, Combinatorial Analysis,Combinations, Stirling’s Approximation to n!, Relation of Probability toPoint Set Theory, Euler or Venn Diagrams and Probability.Elementary Sampling Theory: Sampling Theory, Random Samples andRandom Numbers, Sampling With and Without Replacement, SamplingDistributions, Sampling Distribution of Means, Sampling Distribution ofProportions, Sampling Distributions of Differences and Sums, StandardErrors, Software Demonstration of Elementary Sampling Theory.
12
Unit III
Statistical Estimation Theory: Estimation of Parameters, UnbiasedEstimates, Efficient Estimates, Point Estimates and Interval Estimates;Their Reliability, Confidence-Interval Estimates of Population Parameters,Probable Error.Statistical Decision Theory: Statistical Decisions, Statistical Hypotheses,Tests of Hypotheses and Significance, or Decision Rules, Type I and TypeII Errors, Level of Significance, Tests Involving Normal Distributions,Two-tailed and One-tailed Tests, Special Tests, Operating-CharacteristicCurves; the Power of a Test, p-Values for Hypotheses Tests, ControlCharts, Tests Involving Sample Differences, Tests Involving BinomialDistributions.Statistics in R: Mean, Median, Mode, Normal Distribution, BinomialDistribution, Frequency Distribution in R.
12
Unit IV
Small Sampling Theory: Small Samples, Student’s t Distribution,Confidence Intervals, Tests of Hypotheses and Significance, The Chi-Square Distribution, Confidence Intervals for Sigma, Degrees of Freedom,The F Distribution.The Chi-Square Test: Observed and Theoretical Frequencies, Definitionof Chi-Square, Significance Tests, The Chi-Square Test for Goodness ofFit, Contingency Tables, Yates’ Correction for Continuity, SimpleFormulas for Computing Chi-Square, Coefficient of Contingency,Correlation of Attributes, Additive Property of Chi-Square.
12
Unit V
Curve Fitting and the Method of Least Squares: Relationship betweenVariables, Curve Fitting, Equations of Approximating Curves, FreehandMethod of Curve Fitting, The Straight Line, The Method of Least Squares,The Least Squares Line, Non-linear Relationships, The Least SquaresParabola, Regression, Applications to Time Series, Problems InvolvingMore than Two Variables.Correlation Theory: Correlation and Regression, Linear Correlation,Measures of Correlation, The Least Squares Regression Lines, StandardError of Estimate, Explained and Unexplained Variation, Coefficient ofCorrelation, Remarks Concerning the Correlation Coefficient, ProductMoment Formula for the Linear Correlation Coefficient, ShortComputational Formulas, Regression Lines and the Linear CorrelationCoefficient, Correlation of Time Series, Correlation of Attributes,Sampling Theory of Correlation, Sampling Theory of Regression.
12
List of Practicals
1. Using R, execute the basic commands, array, list and frames.
2. Create a matrix using R and perform the operations: addition, inverse, transpose andmultiplication operations.
3. Using R, execute the statistical functions: mean, median, mode, quartiles, range and inter-quartile range histogram.
4. Using R, import the data from Excel / .CSV file and perform the above functions.
5. Using R, import the data from Excel / .CSV file and calculate the standard deviation,variance and co-variance.
6. Using R, import the data from Excel / .CSV file and draw the skewness.
7. Import the data from Excel / .CSV and perform the hypothetical testing.
8. Import the data from Excel / .CSV and perform the Chi-Square Test.
9. Using R, perform the binomial and normal distribution on the data.
10. Perform the Linear Regression using R.
11. Compute the Least Squares means using R.
12. Compute the Linear Least Square Regression.
Paper Pattern
Evaluation Scheme1. Internal Evaluation: 25 Marks
(i) Test: 1 Class test of 20 marks.Attempt any four of the following: (20)(a)(b)(c)(d)(e)(f)
(ii) 5 marks: Active participation in the class, overall conduct, attendance.
2. External Examination: 75 MarksAll questions are compulsory
(i) (Based on Unit 1) Attempt any three of the following: (15)(a)(b)(c)(d)(e)(f)
(ii) (Based on Unit 2) Attempt any three of the following: (15)(a)(b)(c)(d)(e)(f)
(iii) (Based on Unit 3) Attempt any three of the following: (15)(a)(b)(c)(d)(e)(f)
(iv) (Based on Unit 4) Attempt any three of the following: (15)(a)(b)
(c)(d)(e)(f)
(v) (Based on Unit 5) Attempt any three of the following: (15)(a)(b)(c)(d)(e)(f)
3. Practical Exam: 50 marksCertified copy journal is essential to appear for the practical examination.1. Practical Question 1 (20)2. Practical Question 2 (20)3. Journal (5)4. Viva Voce (5)
Contents
UNIT I1. The Mean, Median, Mode and Other Measures of Central Tendency 1 - 362. The Standard Deviation and Other Measures of Dispersion 37 - 57
3. Introduction to R 58 - 90UNIT II
4. Moments, Skewness and Kurtosis 91 - 1115. Elementary Probability Theory 112 - 125
6. Elementary Sampling Theory 126 - 137UNIT III
7. Statistical Estimation Theory 138 - 1458. Statistical Decision Theory 146 - 163
9. Statistics in R 164 - 174UNIT IV
10. Small Sampling Theory 175 - 19011. The Chi-Square Test 191 - 205
UNIT V12. Curve Fitting and the Method of Least Squares 206 - 220
13. Correlation Theory 221 - 243
Structure
1.1 Index or Subscript, Notation
1.2 Summation Notation
1.3 Averages or Measures of Central Tendency
1.4 Arithmetic Mean
1.5 The Weighted Arithmetic Mean
1.6 Properties of the Arithmetic Mean
1.7 The Arithmetic mean Computed from Grouped Data
1.8 The Median
1.9 The Mode
1.10 The Empirical Relation between the Mean, Median and Mode
1.11 The Geometric Mean (G.M.)
1.12 The Harmonic Mean (H.M.)
1.13 The Relation Between Arithmetic, Geometric and Harmonic Means
1.14 The Root mean Square
1.15 Quartiles, Deciles and Percentiles
1.16 Software and Measures of Central Tendency
Solved Examples
Practice Examples
1.1 INDEX OR SUBSCRIPT, NOTATION Let the symbol Xj (read ‘‘X sub j’’) denote any of the N values X1, X2, X3, ... , XN assumed by a
variable X. The letter j in Xj, which can stand for any of the numbers 1, 2, 3, ... , N is called a subscript, or index. Clearly any letter other than j, such as i, k, p, q, or s, could have been used as well.
CHAPTER 1 The Mean, Median, Mode and Other
Measures of Central Tendency
Unit I
2
1.2 T∑
The sy
1.3 A
other describaverag
Mpermit
A
Mean
SUMMAT
The symbol ∑ = X1 + X2 +
ymbol is th
AVERAGE
According to Sfigures congbe or represge is an overa
Measures of cts us to comp
Averages are d
n (Average
According
A. E. Warepresent
Accordingdata that within the
Crum andbecause in
ArithmeMean(A
TION NOTAT∑ is use
+ X3 +…+ XN
he Greek cap
ES OR MEAS
Simpson andgregate or whent a whole
all single valu
entral tendenare different
derived figure
e)
g to Clark, “A
augh definesthem in som
g to Croxtonis used to re
e range of the
d Smith say,ndividual val
MatAv
GeMe
etic A.M)
TION ed to denote t
pital letter sig
SURES OF CE
d Kafka, A mehich divides
series of figue which repr
ncy permits usseries of figu
es and not the
An average is
, “An average way.”
n and Cowdeepresent all te data it is som
, “An averaglues of the va
hematical verages
eometric ean(G.M)
the sum of al
gma, denotin
ENTRAL TEN
easure of centheir numbe
gures involvresents the ser
s to compare ures with rega
e original dat
s a figure that
ge is a singl
en, “An averathe values inmetimes calle
ge is sometimariable usually
MeasureCentra
Tenden
Harmonic Mean(H.M)
Compute
l the Xj’s from
ng sum.
DENCY ntral tendencyer in half. Thving magnitudries.
individual itard to their ce
ta.
t represents th
le value selec
age is a singln the series. ed a measure
mes called ay cluster arou
es of al ncy
Media
QuartileDeciles aPercenti
er Oriented St
m j = 1 to j =
y is a typical hus an averades of the s
ems in the grentral tendenc
he whole grou
cted from a
le value withSince an aveof central va
a ‘measure ound it.”
Positional Averages
n M
es, and iles
tatistical Tech
N; by definit
value aroundage can be usame variable
roup with it acies.
up.”
group of va
hin the rangeerage is somalue.”
of central ten
Mode
hniques
tion,
d which used to e. That
and also
alues to
e of the ewhere
ndency’
The Mean, Median, Mode and Other Measures of Central Tendency 3
Characteristics of a Good Average
An average should be:
1. Rigorously defined,
2. Easy to compute,
3. Capable of simple interpretation,
4. Dependent on all the observed values,
5. Not unduly influenced by one or two extremely large or small values,
6. Should fluctuate relatively little from one random sample or small values,
7. Be capable of mathematical manipulation.
1.4 THE ARITHMETIC MEAN An arithmetic mean is a measure of central tendency and is popularly known as mean. Arithmetic
mean is obtained by dividing the sum of the values of all items of a series by the number of items of that series. Normally, arithmetic mean is denoted by which is read as ‘X bar’. It can be computed for unclassified or ungrouped data or individual series as well as classified or grouped data or discrete or continuous series.
Practical Steps Involved in the Computation of Arithmetic Mean for Unclassified Data
Step 1 Treat the given values of variables as X.
Step 2 Enter the given values in a column headed as X.
Step 3 Add together all the values of variable X and obtain the total i.e., ∑X.
Step 4 Apply the following formula: = ∑
where, = Arithmetic Mean
∑X = Sum of all values of variables X
N = Number of individual observation
1.5 THE WEIGHTED ARITHMETIC MEAN While calculating arithmetic mean, as discussed earlier, equal importance (or weight) is given to
each observation in the data set. However, there are situations in which values of individual observations in the data set are not of equal importance. If such values occur with different frequencies, then computing A.M. of values (as opposed to the A.M. of observations) may not be true representative of the data set characteristic and thus may be misleading. Under these circumstances, we may attach to each observation value a ‘weight’ w1, w2… wn as an indicator of their importance within the data set and compute a weighed mean or average denoted by w as follows:
w = ∑∑
4 Computer Oriented Statistical Techniques
Note: The weighted arithmetic mean should be used
1. when the importance of all the numerical values in the given data set is not equal; 2. when the frequencies of various classes are widely varying; 3. where there is a change either in the proportion of numerical values or in the proportion of their
frequencies; 4. when ratios, percentages orates are being averaged.
1.6 PROPERTIES OF THE ARITHMETIC MEAN 1. The algebraic sum of the deviations of a set of numbers from their arithmetic mean is zero.
2. The sum of squares of deviations of observations is minimum when taken from their arithmetic mean.
3. Arithmetic mean is capable of treated algebraically.
4. If and N1 are the mean and number of observations of a series and and N2 are the corresponding magnitudes of another series, then the mean of the combined series of N1 + N2 observations is given by = ++
5. If a constant B is added (subtracted) from every observation, the mean of these observations also gets added (subtracted) by it.
6. If every observation is multiplied (divided) by a constant b, the mean of these observations also gets multiplied (divided) by it.
7. If some observations of a series are replaced by some other observations, then the mean of original observations will change by the average change in magnitude of the changed observations.
1.7 THE ARITHMETIC MEAN COMPUTED FROM GROUPED DATA
Practical Steps Involved in the Computation of Arithmetic Mean for Discrete Series
Step 1 Treat the given values of variables as X and frequencies as f.
Step 2 Enter the given values of variable X in a column headed as X.
Step 3 Enter the given frequencies f in a column headed as f and obtain the sum of these frequencies i.e. N of ∑f.
Step 4 Multiply the variable of each row with the respective frequency and denote these products by fX and enter the same in a column headed as fX.
Step 5 Obtain the sum of these products i.e. ∑fX.
Step 6 Apply the following formula: = ∑
where, = Arithmetic Mean
The Mean, Median, Mode and Other Measures of Central Tendency 5
∑ = Sum of products of frequency and value of variables X
N = ∑f =Sum of frequencies
Practical Steps Involved in the Computation of Arithmetic Mean for Continuous Series
Step 1 Enter the class intervals in the first column.
Step 2 Calculate the mid-point of each class, denote these mid-points as m and enter the same in a column headed as m.
Note:Mid-point (m) =
Step 3 Enter the given frequencies f in a column headed as f and obtain the sum of these frequencies i.e. N of ∑f.
Step 4 Multiply the mid-point of each row with the respective frequency and denote these products by fm and enter the same in a column headed as fm.
Step 5 Obtain the sum of these products i.e. ∑fm.
Step 6 Apply the following formula: = ∑
Where, = Arithmetic Mean
∑ = Sum of products of mid-points and frequency
N = ∑f = Sum of frequencies
1.8 THE MEDIAN Median is the central value of the variable that divide the series into two equal parts in such a way
that half of the items lie above this value and the remaining half lie below this value. Median is called a positional average because it is based on the position of a given observation in a series arranged in an ascending or descending order and the position of the median is such that an equal number of items lie on either side of it. Median is usually denoted by ‘Med’ or ‘Md’. Median can be computed for both ungrouped data (and individual series) and grouped data (or Discrete/Continuous Series).
Computation of Median for Individual Series
Step 1 Arrange the size of observation in ascending or descending order.
Step 2 Ascertain th observation.
Step 3 Calculate Median as follows:
(a) In case th observation works out to be a whole number.
Median = size or value of th observation in the data array
(b) In case th observation works out to be in fractions,
Median = size or value of full item + 50% of the difference between size of immediate next item and size of full item.
6 Computer Oriented Statistical Techniques
Computation of Median for Discrete Series
Step 1 Arrange the size of observation in ascending or descending order.
Step 2 Calculate Cumulative Frequencies (c.f.)
Step 3 Ascertain th observation.
Step 4 Ascertain the Cumulative Frequency which includes th observation
Step 5 Calculate Median as follows:
Median = size or value of the observation corresponding to the cumulative frequency which
includes th observation
Computation of Median for Grouped Data or Continuous Series
Step 1 Calculate Cumulative Frequencies (c.f.)
Step 3 Ascertain th observation.
Step 4 Ascertain the Cumulative Frequency which includes th observation, the corresponding
class frequency (f) and lower limit (L) of that class, the interval between the upper and lower limit of class and cumulative frequency of the preceding class (c.f.).
Step 3 Calculate Median as follows:
Median = + . . ×
Where, L = Lower limit of the class
c.f. = Cumulative frequency of the preceding class
f = Frequency of the class
i = Interval between upper and lower limit of class
Note: To find median value by using interpolation, it is assumed that the numerical values of observations are
evenly spaced over the entire class interval.
Merits of Median
1. The median is useful in case of frequency distribution with open-end classes.
2. The median is recommended if distribution has unequal classes.
3. Extreme values do not affect the median as strongly as they affect the mean.
4. It is the most appropriate average in dealing with qualitative data.
5. The value of median can be determined graphically where as the value of mean cannot be determined graphically.
6. It is easy to calculate and understand.
The Me
Deme 1
2
3
4
1.9 M
greatearoundfashionimmedone m
1
2
3
Com
Step 1
Step 2
Step 3
Com
Step 1
Step 2
Step 3
Note
ean, Median, M
erits of Med1. For calcu
need arran
2. Since it iseries.
3. Median is
4. The samp
THE MOD
Mode is oftenst frequency.d which the onable value odiate neighboode or two m
1. Unimoda
2. Bimodal:
3. Multimod
putation o
Count the
2 Ascertain
Mode = V
putation o
Ascertain
2 Ascertain
Mode = V
e: In case of D
determined
Mode and Oth
dian ulating mediangement.
is a positiona
s not capable
pling stability
DE n said to be t. But it is noobservations of distributionourhood. It is
modes or seve
al: A distribut
: A distributio
dal: A distrib
of Mode fo
e number of ti
n the value oc
Value occurrin
of Mode fo
n maximum fr
n the value of
Value of the o
Discrete series
d just by inspect
her Measures o
an it is neces
al average it
for further al
y of the media
that value in ot exactly true
tend to concn because it is usually denoral modes.
tion is said to
on is said to b
bution is said
or Individu
imes the vari
curring the m
ng maximum
or Discrete
requency
f the observati
observation co
(i.e. where val
tion method.
of Central Ten
sary to arran
s value is no
lgebraic calcu
an is less as c
a series whie for every f
centrate most is the value woted by Mo. I
o be Unimoda
be bimodal if
to be multim
ual Series
ous values of
maximum num
m number of ti
e Series
ion correspon
orresponding
lue of observati
ndency
nge the data,
ot determined
ulations.
ompared to m
ich occurs mfrequency dis
heavily. It iwhich has theIt may be no
al if it has onl
f it has two m
modal if it has
f the series re
mber of times
imes.
nding to maxi
g to maximum
ions along with
where as oth
d by all the
mean.
most frequentlstribution. Ras also called greatest freq
oted that a dis
ly one mode.
modes.
s more than tw
epeat themselv
s.
imum frequen
m frequency.
h frequencies ar
her averages
observations
ly or which hather it is tha
the most typquency densitstribution ma
wo modes.
ves.
ncy.
re given), mode
7
do not
s in the
has the at value pical or ty in its ay have
e can be
8 Computer Oriented Statistical Techniques
Computation of Mode for Grouped Data or Continuous Series
Step 1 Ensure that given series is a continuous exclusive series having equal class-intervals. If the given series is not a continuous exclusive series, follow the procedure suggested below:
Given Series Procedure to be followed
Less than series Convert into continuous exclusive series
More than series Convert into continuous exclusive series
Inclusive series Convert into continuous exclusive series
Having unequal class intervals
Make the class intervals equal and adjust the frequencies assuming that they are equally distributed throughout the class.
Step 2 Ascertain the modal class as follows:
(a) By preparing the Grouping Table and Analysis in case there is a small difference between the maximum frequency and the frequency preceding it or succeeding it.
(b) By inspection in other cases. In his case the class with maximum frequency is the Modal Class.
Step 3 Calculate the Mode as follows:
1. By inspection formula in case of Unimodal distribution (i.e. where there is single mode)
(a) Where the modal class is one having the maximum frequency
Mode = = + │ ││ – │ ×
Where, L = Lower limit of the Modal Class f1 = Frequency of the Modal Class f0 = Frequency of the pre-modal class i.e. preceding the modal class f2 = Frequency of the post-modal class i.e. succeeding the modal class I = Class interval of Modal Class
Notes:
1. If Modal Class id the first class, f0 is taken as zero. 2. If Modal Class id the last class, f2 is taken as zero. 3. Where the Modal Class is other than the one having the maximum frequency
2. By Empirical relationship formula in case of bimodal or multimodal distribution (i.e. where there are two or more values having the same maximum frequency)
Mode = 3 Median – 2 Mean
Merits of Mode 1. It is easy to calculate and simple to understand.
2. It is not affected by the extreme values.
3. The value of mode can be determined graphically.
4. Its value can be determined in case of open-end class interval.
5. The mode is the most representative of the distribution.
The Me
Deme 1
2
3
4
5
1.10If
set is numer
Ifsaid tosugges
M
O
Iffigure are corepreseand mmeasu
M
Bare conmean m
M
ean, Median, M
erits of Mod1. It is not su
2. The value
3. The value
4. The mode
5. It is difficis zero.
0 THE EMP
f values of msymmetrical
rical values in
f most of the o be skewed. sted by Karl P
Mean – Mode
OR Mode =
f most of the (b), then it is
oncentrated menting highes
mean more to tures will be
Mean > Media
But if the distncentrated mmove to the l
Mean < Media
Median = Me
(a) Symm
Mode and Oth
de uitable for fu
e of mode can
e of mode is n
e is strictly de
cult to calcula
IRICAL RELA
mean, median as shown in
n the data set
values fall eIn such case
Pearson is as
= 3 (Mean –
3 (Median –
values of obss said to be skmore to the rst frequency)the right (valu
an > Mode
tribution is skmore to the lef
left of mode.
an < Mode
an = Mode
etrical
her Measures o
urther mathem
nnot always b
not based on
efined.
ate when one
ATION BETW
and mode arn the figure (a
is not symme
either to the res, a relationsfollows:
– Median)
2 Mean)
servations in kewed to the rright of the ) but the medue that is affe
kewed to the ft of the modeThe order of
M
of Central Ten
matical treatm
be determined
each and eve
e of the observ
WEEN THE Mre equal, thena). But, if thetrical as show
right or to theship between
a distributionright or positmode). In thdian (value tected by extr
left or negate), then mode
f magnitude o
Mode Median
(b) Skewed to t
ndency
ments.
d.
ery item of the
vations is zer
EAN, MEDIA
n distributionhese values arwn in figure
e left of the mn these three
n fall to the rtively skewedhis case, modthat depends eme values).
ively skewede is again undf these measu
Mean
the Right
e series.
ro or the sum
AN AND MO
n of numericare not equal (b) and figure
mode, then sumeasures of
ight of the md (i.e. values ode remains uon the numbThe order of
d (i.e. values der the peak wures will be
Mean
(c) Sk
m of the obser
ODE al values in ththen distribue (c).
uch a distribucentral tende
mode as shownof higher magunder the peaber of observf magnitude o
of lower magwhereas medi
Median Mod
kewed to the Lig
9
rvations
he data ution of
ution is ency as
n in the gnitude ak (i.e.
vations) of these
gnitude ian and
de
ght
10 Computer Oriented Statistical Techniques
In both the cases, the difference between mean and mode is three times the difference between mean and median.
In general, for a single mode skewed distribution (non-symmetrical), the median is preferred to the mean for measuring location because it is neither influenced by the frequency of occurrence of a single observation value as mode nor it is affected by extreme values.
1.11 THE GEOMETRIC MEAN (G.M.) In many business and economics problems, such as calculation of compound interest and
inflation, quantities (variables) change over a period of time. In such cases, a decision maker may like to know an average percentage change rather than simple average value to represent the average growth or declining rate in the variable value over a period of time. Thus, another measure of central tendency called geometric mean (G.M.) is calculated.
For example, consider the annual growth rate of output of accompany in the last five years.
Year Growth Rate (Percent) Output at the end of the Year
2006 5.0 105.00
2007 7.5 112.87
2008 2.5 115.69
2009 5.0 121.47
2010 10.0 133.61
The simple arithmetic mean of the growth rate is
= (5 + 7.5 + 2.5 + 5 + 10) = 6
This value of mean implies that if 65 percent is the growth rate, then output at the end of year 2012 should be 133.81, which is slightly more than the actual value, 133.61. Thus the correct growth rate should be less than 6.
To find the correct growth rate, we apply the formula of geometric mean:
G.M. = √Product of all the n values
= √ 1 ∙ 2 ∙ … . = (X1·X2·X3…..Xn)1/n
In other words, G.M. of a set of n observations is the nth root of their product.
Substituting the values of growth rate in the given formula, we have
G.M. = √5 × 7.35 × 2.5 × 5 × 10 = √4687.5 = 5.9 percent average growth.
Computation of Geometric Mean for Individual Series
If the number of observations are more than three, then G.M. can be calculated by taking logarithm on both the sides of the equation. The formula for G.M. for un-grouped data can be expressed in terms of logarithms as shown below:
Log (G.M.) = log (X1·X2·…Xn)
= (logX1 + logX2 +…. + logXn) = ∑ i
The Mean, Median, Mode and Other Measures of Central Tendency 11
and therefore G.M. = Antilog { ∑ i }
or G.M. = Antilog [∑
] where, N = Total no. of items
Computation of Geometric Mean for Discrete Series
If the observations X1, X2,…, Xn occur with frequencies f1, f2,…, fn, respectively, and the total frequencies are, n = ∑ i then the G.M. foe such data is given by
log (G.M.) = {f1 logX1 + f2 logX2 + …+ fn logXn}
= ∑ ( )
G.M. = Antilog { ∑fi logXi}
OR G.M. = Antilog [∑
] where, N=Total no. of items
Computation of Geometric Mean for Grouped Data or Continuous Series
Step 1 Calculate the mid-points of each class and enter these mid-points in the column headed as ‘m’
Step 2 Take the logarithms of each mid-point and enter in the column headed as log m.
Step 3 Multiply these logarithms (log m) with the respective frequencies and enter these products (f log m) in the column headed as f log m and then obtain their total i.e. ∑f log m.
Step 4 Calculate Geometric Mean as follows:
G.M. = Antilog [ ∑
]
Weighted Geometric Mean
Like the weighted Arithmetic Mean, Weighted Geometric Mean may be calculated. Symbolically,
G.M.W = × …
Computation of Weighted Geometric Mean
Step 1 Take the logarithms of each item of variable X and enter in the column headed as log X.
Step 2 Multiply these logarithms (log X) with the respective weights (W) and enter these products (W log X) in the column headed as W log X and then obtain their total i.e. ∑W log X.
Step 3 Calculate Geometric Mean as follows:
G.M. = Antilog [ ∑ ∑ ]
Uses of Geometric Mean
(i) Geometric Mean is used to find the average percentage in sales, production etc.
(ii) Geometric Mean is used to find the index numbers since it shows the relative change.
12 Computer Oriented Statistical Techniques
(iii) When large weights are given to small items and small weights are given to large items, the best measure of central tendency is Geometric Mean. That is, when there are extreme values, the best measure of central tendency to be used is Geometric Mean.
Merits of Geometric Mean
(i) Geometric Mean is calculated based on all observations in the series.
(ii) Geometric Mean is clearly defined.
(iii) Geometric Mean is not affected by extreme values in the series.
(iv) Geometric Mean is amenable to further algebraic treatment.
(v) Geometric Mean is useful in averaging ratios and percentages.
Demerits of Geometric Mean
(i) Geometric Mean is difficult to understand.
(ii) We cannot compute geometric mean if there are both positive and negative values occur in the series.
(iii) We cannot compute geometric mean if one or more of the values in the series is zero.
1.12 THE HARMONIC MEAN (H.M.) The harmonic mean (H.M) is defined as the reciprocal of the arithmetic mean of the reciprocal of
the individual observations.
H.M. = ( … ) Where, X1, X2… Xn refer to the value of various items of the series
N = Total number of items of the series
Computation of Harmonic Mean for Individual Series
Step 1 Calculate the reciprocals of each item of variable X and enter in the column headed as and
obtain their total i.e. ∑
Step 2 Calculate H.M. as follows: H.M. = ∑( ) Computation of Harmonic Mean for Discrete Series
Step 1 Calculate the reciprocals of each item of variable X and enter in the column headed as
Step 2 Multiply these reciprocals ( ) with the respective frequencies and enter these products ( )
in the column headed as and then obtain their total i.e. ∑( )
Step 3 Calculate Harmonic Mean as follows: H.M. = ∑( )
The Mean, Median, Mode and Other Measures of Central Tendency 13
Computation of Harmonic Mean for Grouped Data or Continuous Series
Step 1 Calculate the mid-point of each item of variable X and enter these mid-points in column headed as m.
Step 2 Calculate the reciprocals of the mid-points and in the column headed as .
Step 2 Multiply these reciprocals ( ) with the respective frequencies and enter these products ( )
in the column headed as and then obtain their total i.e. ∑( )
Step 3 Calculate Harmonic Mean as follows: H.M. = ∑( ) Weighted Harmonic Mean
Like the weighted Arithmetic Mean, Weighted Harmonic Mean may be calculated. Symbolically,
G.M. = ∑∑( )
Uses of harmonic Mean
(i) The H.M is used for computing the average rate of increase in profits of a concern.
(ii) The H.M is used to calculate the average speed at which a journey has been performed.
Merits of Harmonic Mean
(i) Its value is based on all the observations of the data.
(ii) It is less affected by the extreme values.
(iii) It is suitable for further mathematical treatment.
(iv) It is strictly defined.
Demerits of Harmonic Mean
(i) It is not simple to calculate and easy to understand.
(ii) It cannot be calculated if one of the observations is zero.
(iii) The H.M is always less than A.M and G.M.
1.13 THE RELATION BETWEEN ARITHMETIC, GEOMETRIC AND HARMONIC MEANS (a) For any finite number of positive values of a variable, A.M. ≥ G.M. ≥ H.M.
Proof: We shall prove it in case of two positive numbers. Let x1 and x2 be the two positive numbers.
Now, A.M. of x1 and x2 =
, their G.M. = √ × and their H.M. = (√ − √ ) ≥ 0 (Since square of a real number is non-negative) ≫ ( ) + ( ) − 2 ≥ 0 ≫ + ≥ 2
14 Computer Oriented Statistical Techniques ≫ + 2 ≥ ≫ A. M. ≥ G. M. … (I)
Again, √ − √ ≥ 0
≫ + − √ ≥ 0 ≫ ≥ 2 +
≫ G. M. ≥ H. M …(II)
From Eq. (1) and (2), we get
A.M. ≥ G.M. ≥ H.M.
(b) For any two positive numbers, A.M. × H.M. = (G.M.)2.
Proof: Let, a and b be the two positive numbers, we have A. M. = a + b2 , G. M. = √ab H. M. = 2+ ( ) = 2aba + b
( . . ) × ( . . ) = + × + = = ( . . )
1.14 THE ROOT MEAN SQUARE The root mean square (RMS), or quadratic mean, of a set of numbers X1, X2, ... , XN is sometimes
denoted by and is defined by
RMS = = ∑ = ∑
This type of average is frequently used in physical applications.
Example: The RMS of the set 1, 3, 4, 5, and 7 is = √20 = 4.47
1.15 QUARTILES, DECILES AND PERCENTILES If a set of data is arranged in order of magnitude, the middle value (or arithmetic mean of the two
middle values) that divides the set into two equal parts is the median. By extending this idea, we can think of those values which divide the set into four equal parts. These values denoted by Q1, Q2, and Q3, are called the first, second, and third quartiles, respectively, the value Q2 being equal to the median. Similarly, the values that divide the data into 10 equal parts are called deciles and are denoted by D1, D2,..., D9, while the values dividing the data into 100 equal parts are called percentiles and are denoted by P1, P2,... , P99. The fifth decile and the 50th percentile correspond to the median. The 25th
The Mean, Median, Mode and Other Measures of Central Tendency 15
and 75th percentiles correspond to the first and third quartiles, respectively. Collectively, quartiles, deciles, percentiles, and other values obtained by equal subdivisions of the data are called quantiles.
1.16 SOFTWARE AND MEASURES OF CENTRAL TENDENCY The output for all five packages is given for the test scores:
Test Scores
25 28 28 28 29 30 32 33 33 33 34 34 35 36 37
38 41 42 42 45 46 47 51 51 53 53 53 55 56 57
57 60 61 62 62 62 67 68 69 71 72 73 73 75 75
79 82 85 86 86 86 88 88 89 91 93 94 96 96 99
EXCEL
If the pull-down ‘‘Tools => Data Analysis => Descriptive Statistics’’ is given, the measures of central tendency median, mean, and mode as well as several measures of dispersion are obtained:
Mean 59.16667
Standard Error 2.867425
Median 57
Mode 28
Standard Deviation 22.21098
Sample Variance 493.3277
Kurtosis 1.24413
Skewness 0.167175
Range 74
Minimum 25
Maximum 99
Sum 3550
Count 60
MINITAB
If the pull-down ‘‘Stat=> Basic Statistics => Display Descriptive Statistics’’ is given, the following output is obtained:
Descriptive Statistics: testscore
Variable N N* Mean SE Mean St Dev Minimum Q1 Median Q3
Testscore 60 0 59.17 2.87 22.21 25.00 37.25 57.00 78.00
Variable Maximum
testscore 99.00
16 Computer Oriented Statistical Techniques
SPSS
If the pull-down ‘‘Analyze => Descriptive Statistics => Descriptives’’ is given, the following output is obtained:
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Testscore valid N (listwise)
60 60
25.00 99.00 59.1667 22.21098
SAS
If the pull-down ‘‘Solutions =>Analysis => Analyst’’ is given and the data are read in as a file, the pull-down ‘‘Statistics => Descriptive => Summary Statistics’’ gives the following output:
STATISTIX
If the pull-down ‘‘Statistics =>Summary Statistics => Descriptive Statistics’’ is given in the software package STATISTIX, the following output is obtained:
SOLVED EXAMPLES Example 1: Write out the terms in each of the following indicated sums:
(a) ∑ (b) ∑ − 3 (c) ∑ (d) ∑ (e) ∑ −
Solution: (a) + + + + +
(b) ( − 3) + ( − 3) + ( − 3) + ( − 3)
(c) + + + ⋯ + =
(d) + + + +
(e) ( − )+) ( − )+) ( − ) = + + − 3a Example 2: Express each of the following by using the summation notation:
(a) X + X + X + ⋯ + X
(b) (X + Y ) + (X + Y ) + ⋯ + (X + Y )
(c) f X + f X + ⋯ + f X
(d) a b + a b + a b + ⋯ + aNbN
(e) f X Y + f X Y + f X Y + f X Y
Solution: (a) ∑ X
(b) ∑ X + Y
(c) ∑ f X
The Mean, Median, Mode and Other Measures of Central Tendency 17
(d) ∑ a bN
(e) ∑ f X Y
Example 3: Calculate the arithmetic mean of the following observations.
32, 35, 36, 37, 39, 41, 43, 47, 48
Solution: A.M. = = = 39.77
Example 4: In a survey of 5 cement companies, the profit (in ` crore) earned during a year was 15, 20, 10, 35 and 32. Find the arithmetic mean of the profit earned.
Solution: A.M. = = 22.4
Thus, the arithmetic of the profit earned by these companies during a year was ` 22.4 crore.
Example 5: An examination was held to decide for awarding of a scholarship. The weights of various subjects were different. The marks obtained by 3 candidates (out of 10 in each subject) are given below:
Subject Weight Students
A B C
Mathematics 4 60 57 62
Physics 3 62 61 67
Chemistry 2 55 53 60
English 1 67 77 49
Calculate the weighted A.M. to award the scholarship.
Solution: The calculation of the weighted arithmetic mean is shown below:
Subject Weight (wi)
Students
Student A Student B Student C
Marks (Xi) Xiwi Marks (Xi) Xiwi Marks (Xi) Xiwi
Mathematics 4 60 240 57 228 62 248
Physics 3 62 186 61 183 67 201
Chemistry 2 55 110 53 106 60 120
English 1 67 67 77 77 49 49
10 244 603 248 594 238 618
Applying the formula for weighted mean, we get:
wA = = 60.3; A = = 61
wB = = 59.4; B = = 62
wC = = 61.8; C = = 59.3
18 Computer Oriented Statistical Techniques
From above calculations, it may be noted that student B should get the scholarship as per simple A.M. values, but according to weighted A.M., student C should get the scholarship because all the subjects of examination are not of equal importance.
Example 6: The owner of a general store was interested in knowing the mean contribution (sales price minus variable cost) of his stock of 5 items. The data is given below:
Product Contribution per Unit Quantity Sold
1 6 160
2 11 60
3 8 260
4 4 460
5 14 110
Solution: If the owner ignores the values of the individual products and gives equal importance to each product, then the mean contribution per unit sold will be
= (1/5) 6 + 11 + 8 + 4 + 14 = ` 8.6
However, ` 8.60 may not necessarily be the mean contribution per unit of different quantities of the products sold. In this case, the owner has to take into consideration the number of units of each product sold as different weights. Computing weighted A.M. by multiplying units sold (w) of a product by its contribution (X). That is,
= ( ) ( ) ( ) ( ) ( )
= ,, = ` 6.74
This value, ` 6.74, is different from the earlier value, ` 8.60. The owner must use the value ` 6.74 for decision making purpose.
Example 7: Find the mean from the following data:
X 5 10 15 20 25 30 35 40
f 5 9 13 21 2 15 8 3
Solution: Total Frequency = ∑f = 5+9+13+21+2+15+8+3
= 76 = Number of values
X f fX
5 5 25
10 9 90
15 13 195
20 21 420
25 2 50
30 15 450
35 8 280
40 3 120
∑f = 76 ∑fX = 1630
The Mean, Median, Mode and Other Measures of Central Tendency 19
∑fX = Sum of the products of X values with their respective frequencies.
= Sum of the values = 1630
Arithmetic Mean = ∑∑ = = 21.44
Example 8: If A, B, C and D are four chemicals costing ` 15, ` 12, ` 8 and ` 5 per 100g, respectively, and are contained in a given compound in the ratio of 1, 2, 3 and 4 parts, respectively, then what should be the price of the resultant compound.
Solution: A.M. = ∑∑ =
× × × × = ` 8.30
Example 9: The daily earning (in rupees) of 175 employees working on a daily basis in a firm are:
Daily Earnings (`) 100 120 140 160 180 200 220
Number of Employees 3 6 10 15 24 42 75
Calculate the average daily earning for all employees by assumed mean method.
Solution: Let us take assumed mean, A = 160.
The calculation of average daily earning for employees is shown below:
Daily Earnings (in `) (Xi)
Number of Employees (fi)
di = Xi – A = Xi - 160
fi di
100 3 -60 -180
120 6 -40 -240
140 10 -20 -200
160 15 0 0
180 24 20 480
200 42 40 1680
220 75 60 4500
∑f = 175 ∑fd = 6040
The required A.M. ( ) using the formula is given by: = A + ∑ = 160 + 6040/175 = ` 194.51
Thus, the average daily earning for all employees is ` 194.51
Example 10: A company is planning to improve plant safety. For this, accident data for the last 50 weeks was complied. These data are grouped into the frequency distribution as shown below. Calculate the A.M. of the number of accidents per week.
Number of accidents 0-10 10-20 20-30 30-40 40-50
Number of weeks 6 20 10 8 2
Solution: The calculation of Arithmetic Mean is shown below: