COMPUTER ORIENTED · Range, The 10-90 Percentile Range, The Standard Deviation, The Variance, Short...

COMPUTER ORIENTEDSTATISTICAL TECHNIQUES(As per the New Syllabus of Mumbai University for B.Sc. (Information Technology),

Semester IV, 2017-18)

Dr. Dinesh GabhanePh.D. (Mgmt.), M.Phil. (Commerce), MBA (Mktg. & HR), UGC-NET (Mgmt.), B.E. (Production)

Associate Professor, Rajeev Gandhi College of Management Studies,Navi Mumbai.

Ms. Madhuri S. BankarM.Sc. (C/S), MCA, PGDCS&A

Head, Department of Information Technology,K.B.P College,

Vashi, Navi Mumbai.

ISO 9001:2008 CERTIFIED

© AuthorsNo part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by anymeans, electronic, mechanical, photocopying, recording and/or otherwise without the prior written permission of theauthors and the publisher.

First Edition : 2018

Published by : Mrs. Meena Pandey for Himalaya Publishing House Pvt. Ltd.,“Ramdoot”, Dr. Bhalerao Marg, Girgaon, Mumbai - 400 004.Phone: 022-23860170, 23863863; Fax: 022-23877178E-mail: [email protected]; Website: www.himpub.com

Branch Offices :

New Delhi : “Pooja Apartments”, 4-B, Murari Lal Street, Ansari Road, Darya Ganj, New Delhi - 110 002.Phone: 011-23270392, 23278631; Fax: 011-23256286

Nagpur : Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440 018.Phone: 0712-2738731, 3296733; Telefax: 0712-2721216

Bengaluru : Plot No. 91-33, 2nd Main Road, Seshadripuram, Behind Nataraja Theatre,Bengaluru - 560 020. Phone: 080-41138821; Mobile: 09379847017, 09379847005

Hyderabad : No. 3-4-184, Lingampally, Besides Raghavendra Swamy Matham, Kachiguda,Hyderabad - 500 027. Phone: 040-27560041, 27550139

Chennai : New No. 48/2, Old No. 28/2, Ground Floor, Sarangapani Street, T. Nagar,Chennai - 600 012. Mobile: 09380460419

Pune : “Laksha” Apartment, First Floor, No. 527, Mehunpura, Shaniwarpeth (Near Prabhat Theatre),Pune - 411 030. Phone: 020-24496323, 24496333; Mobile: 09370579333

Lucknow : House No. 731, Shekhupura Colony, Near B.D. Convent School, Aliganj,Lucknow - 226 022. Phone: 0522-4012353; Mobile: 09307501549

Ahmedabad : 114, “SHAIL”, 1st Floor, Opp. Madhu Sudan House, C.G. Road, Navrang Pura,Ahmedabad - 380 009. Phone: 079-26560126; Mobile: 09377088847

Ernakulam : 39/176 (New No. 60/251), 1st Floor, Karikkamuri Road, Ernakulam, Kochi - 682 011.Phone: 0484-2378012, 2378016; Mobile: 09387122121

Bhubaneswar : Plot No. 214/1342, Budheswari Colony, Behind Durga Mandap, Bhubaneswar - 751 006.Phone: 0674-2575129; Mobile: 09338746007

Kolkata : 108/4, Beliaghata Main Road, Near ID Hospital, Opp. SBI Bank, Kolkata - 700 010.Phone: 033-32449649; Mobile: 07439040301

DTP by : Bhakti S. Gaonkar

Printed at : M/s. Aditya Offset Process (I) Pvt. Ltd., Hyderabad. On behalf of HPH.

Dedication

I would like to dedicate this book to my mother and father.

I would like to thank my wife and son for continuous support.

I would like to extend my gratitude to all my friends and colleagues forencouraging me in writing this book.

Dr. Dinesh Gabhane

I would like to dedicate this book to my family for allowing me the time towrite it.

I would like to thank my husband Dr. Dinesh Gabhane for standing besideme throughout my work. He is a source of inspiration and motivation forcontinuing to improve my knowledge and move my career forward.

Special thanks go to my dear son Vedant who gave me energy.

Last but not least my college principal Dr. V.S. Shivankar, all my friends andcolleagues who encouraged me all time.

Ms. Madhuri S. Bankar

Preface

It gives us immense pleasure to present the first edition of book “Computer Oriented

Statistical Techniques” to the teachers and students of Semester-IV of S.Y.B.Sc. (InformationTechnology). This book has been written as per the syllabus prescribed by University of

Mumbai with effect from academic year 2017-18.

The whole syllabus is divided in to V units and XIII chapters. In each chapter, the conceptand theory is followed by sufficient number of solved examples. We have tried our level best

to present the subject matter in simple language for better understanding of the readers. Wehope that this edition will meet the requirements of the students of S.Y.B.Sc. (IT) in their

examination preparation.

Constructive suggestions and comments from the readers will be sincerely appreciated.

We would be glad to hear from you, if you would like to suggest improvements or tocontribute in any way. Kindly send your correspondence to [email protected] or

[email protected].

Finally, we would like to acknowledge our sincere respect and gratitude to Kiran Gurbani(Head of Computer Science and IT Department, R.K. Talreja College, Ulhasnagar) for reviewing

this book thoroughly and providing an environment which stimulates new thinking andinnovations and her support which helped us for bringing out this book in time.

We are thankful to Mr. S.K. Srivastava for giving us an opportunity and encouragement to

write this book. We also extend our thanks to the staff of Himalaya Publishing House Pvt. Ltd.for assisting us in proof reading and compilation of this book.

Dr. Dinesh Gabhane

Ms. Madhuri Bankar

Syllabus

Computer Oriented Statistical Techniques

Sr. No. Modules/Units Lectures

Unit I

The Mean, Median, Mode and Other Measures of Central Tendency:Index, or Subscript, Notation, Summation Notation, Averages, or Measuresof Central Tendency, The Arithmetic Mean, The Weighted ArithmeticMean, Properties of the Arithmetic Mean, The Arithmetic Mean Computedfrom Grouped Data, The Median, The Mode, The Empirical Relationbetween the Mean, Median, and Mode, The Geometric Mean G, TheHarmonic Mean H, The Relation between the Arithmetic, Geometric andHarmonic Means, The Root Mean Square, Quartiles, Deciles, andPercentiles, Software and Measures of Central Tendency.The Standard Deviation and Other Measures of Dispersion: Dispersionor Variation, The Range, The Mean Deviation, The Semi-InterquartileRange, The 10-90 Percentile Range, The Standard Deviation, TheVariance, Short Methods for Computing the Standard Deviation, Propertiesof the Standard Deviation, Charlie’s Check, Sheppard’s Correction forVariance, Empirical Relations between Measures of Dispersion, Absoluteand Relative Dispersion; Coefficient of Variation, Standardized Variable;Standard Scores, Software and Measures of Dispersion.Introduction to R: Basic Syntax, Data Types, Variables, Operators,Control Statements, R-functions, R-vectors, R-lists, R-arrays.

12

Unit II

Moments, Skewness and Kurtosis: Moments, Moments for GroupedData, Relations Between Moments, Computation of Moments for GroupedData, Charlie’s Check and Sheppard’s Corrections, Moments inDimensionless Form, Skewness, Kurtosis, Population Moments, Skewness,and Kurtosis, Software Computation of Skewness and Kurtosis.Elementary Probability Theory: Definitions of Probability, ConditionalProbability; Independent and Dependent Events, Mutually ExclusiveEvents, Probability Distributions, Mathematical Expectation, Relationbetween Population, Sample Mean, and Variance, Combinatorial Analysis,Combinations, Stirling’s Approximation to n!, Relation of Probability toPoint Set Theory, Euler or Venn Diagrams and Probability.Elementary Sampling Theory: Sampling Theory, Random Samples andRandom Numbers, Sampling With and Without Replacement, SamplingDistributions, Sampling Distribution of Means, Sampling Distribution ofProportions, Sampling Distributions of Differences and Sums, StandardErrors, Software Demonstration of Elementary Sampling Theory.

12

Unit III

Statistical Estimation Theory: Estimation of Parameters, UnbiasedEstimates, Efficient Estimates, Point Estimates and Interval Estimates;Their Reliability, Confidence-Interval Estimates of Population Parameters,Probable Error.Statistical Decision Theory: Statistical Decisions, Statistical Hypotheses,Tests of Hypotheses and Significance, or Decision Rules, Type I and TypeII Errors, Level of Significance, Tests Involving Normal Distributions,Two-tailed and One-tailed Tests, Special Tests, Operating-CharacteristicCurves; the Power of a Test, p-Values for Hypotheses Tests, ControlCharts, Tests Involving Sample Differences, Tests Involving BinomialDistributions.Statistics in R: Mean, Median, Mode, Normal Distribution, BinomialDistribution, Frequency Distribution in R.

12

Unit IV

Small Sampling Theory: Small Samples, Student’s t Distribution,Confidence Intervals, Tests of Hypotheses and Significance, The Chi-Square Distribution, Confidence Intervals for Sigma, Degrees of Freedom,The F Distribution.The Chi-Square Test: Observed and Theoretical Frequencies, Definitionof Chi-Square, Significance Tests, The Chi-Square Test for Goodness ofFit, Contingency Tables, Yates’ Correction for Continuity, SimpleFormulas for Computing Chi-Square, Coefficient of Contingency,Correlation of Attributes, Additive Property of Chi-Square.

12

Unit V

Curve Fitting and the Method of Least Squares: Relationship betweenVariables, Curve Fitting, Equations of Approximating Curves, FreehandMethod of Curve Fitting, The Straight Line, The Method of Least Squares,The Least Squares Line, Non-linear Relationships, The Least SquaresParabola, Regression, Applications to Time Series, Problems InvolvingMore than Two Variables.Correlation Theory: Correlation and Regression, Linear Correlation,Measures of Correlation, The Least Squares Regression Lines, StandardError of Estimate, Explained and Unexplained Variation, Coefficient ofCorrelation, Remarks Concerning the Correlation Coefficient, ProductMoment Formula for the Linear Correlation Coefficient, ShortComputational Formulas, Regression Lines and the Linear CorrelationCoefficient, Correlation of Time Series, Correlation of Attributes,Sampling Theory of Correlation, Sampling Theory of Regression.

12

List of Practicals

1. Using R, execute the basic commands, array, list and frames.

2. Create a matrix using R and perform the operations: addition, inverse, transpose andmultiplication operations.

3. Using R, execute the statistical functions: mean, median, mode, quartiles, range and inter-quartile range histogram.

4. Using R, import the data from Excel / .CSV file and perform the above functions.

5. Using R, import the data from Excel / .CSV file and calculate the standard deviation,variance and co-variance.

6. Using R, import the data from Excel / .CSV file and draw the skewness.

7. Import the data from Excel / .CSV and perform the hypothetical testing.

8. Import the data from Excel / .CSV and perform the Chi-Square Test.

9. Using R, perform the binomial and normal distribution on the data.

10. Perform the Linear Regression using R.

11. Compute the Least Squares means using R.

12. Compute the Linear Least Square Regression.

Paper Pattern

Evaluation Scheme1. Internal Evaluation: 25 Marks

(i) Test: 1 Class test of 20 marks.Attempt any four of the following: (20)(a)(b)(c)(d)(e)(f)

(ii) 5 marks: Active participation in the class, overall conduct, attendance.

2. External Examination: 75 MarksAll questions are compulsory

(i) (Based on Unit 1) Attempt any three of the following: (15)(a)(b)(c)(d)(e)(f)

(ii) (Based on Unit 2) Attempt any three of the following: (15)(a)(b)(c)(d)(e)(f)

(iii) (Based on Unit 3) Attempt any three of the following: (15)(a)(b)(c)(d)(e)(f)

(iv) (Based on Unit 4) Attempt any three of the following: (15)(a)(b)

(c)(d)(e)(f)

(v) (Based on Unit 5) Attempt any three of the following: (15)(a)(b)(c)(d)(e)(f)

3. Practical Exam: 50 marksCertified copy journal is essential to appear for the practical examination.1. Practical Question 1 (20)2. Practical Question 2 (20)3. Journal (5)4. Viva Voce (5)

Contents

UNIT I1. The Mean, Median, Mode and Other Measures of Central Tendency 1 - 362. The Standard Deviation and Other Measures of Dispersion 37 - 57

3. Introduction to R 58 - 90UNIT II

4. Moments, Skewness and Kurtosis 91 - 1115. Elementary Probability Theory 112 - 125

6. Elementary Sampling Theory 126 - 137UNIT III

7. Statistical Estimation Theory 138 - 1458. Statistical Decision Theory 146 - 163

9. Statistics in R 164 - 174UNIT IV

10. Small Sampling Theory 175 - 19011. The Chi-Square Test 191 - 205

UNIT V12. Curve Fitting and the Method of Least Squares 206 - 220

13. Correlation Theory 221 - 243

Structure

1.1 Index or Subscript, Notation

1.2 Summation Notation

1.3 Averages or Measures of Central Tendency

1.4 Arithmetic Mean

1.5 The Weighted Arithmetic Mean

1.6 Properties of the Arithmetic Mean

1.7 The Arithmetic mean Computed from Grouped Data

1.8 The Median

1.9 The Mode

1.10 The Empirical Relation between the Mean, Median and Mode

1.11 The Geometric Mean (G.M.)

1.12 The Harmonic Mean (H.M.)

1.13 The Relation Between Arithmetic, Geometric and Harmonic Means

1.14 The Root mean Square

1.15 Quartiles, Deciles and Percentiles

1.16 Software and Measures of Central Tendency

Solved Examples

Practice Examples

1.1 INDEX OR SUBSCRIPT, NOTATION Let the symbol Xj (read ‘‘X sub j’’) denote any of the N values X1, X2, X3, ... , XN assumed by a

variable X. The letter j in Xj, which can stand for any of the numbers 1, 2, 3, ... , N is called a subscript, or index. Clearly any letter other than j, such as i, k, p, q, or s, could have been used as well.

CHAPTER 1 The Mean, Median, Mode and Other

Measures of Central Tendency

Unit I

2

1.2 T∑

The sy

1.3 A

other describaverag

Mpermit

A

Mean

SUMMAT

The symbol ∑ = X1 + X2 +

ymbol is th

AVERAGE

According to Sfigures congbe or represge is an overa

Measures of cts us to comp

Averages are d

n (Average

According

A. E. Warepresent

Accordingdata that within the

Crum andbecause in

ArithmeMean(A

TION NOTAT∑ is use

+ X3 +…+ XN

he Greek cap

ES OR MEAS

Simpson andgregate or whent a whole

all single valu

entral tendenare different

derived figure

e)

g to Clark, “A

augh definesthem in som

g to Croxtonis used to re

e range of the

d Smith say,ndividual val

MatAv

GeMe

etic A.M)

TION ed to denote t

pital letter sig

SURES OF CE

d Kafka, A mehich divides

series of figue which repr

ncy permits usseries of figu

es and not the

An average is

, “An average way.”

n and Cowdeepresent all te data it is som

, “An averaglues of the va

hematical verages

eometric ean(G.M)

the sum of al

gma, denotin

ENTRAL TEN

easure of centheir numbe

gures involvresents the ser

s to compare ures with rega

e original dat

s a figure that

ge is a singl

en, “An averathe values inmetimes calle

ge is sometimariable usually

MeasureCentra

Tenden

Harmonic Mean(H.M)

Compute

l the Xj’s from

ng sum.

DENCY ntral tendencyer in half. Thving magnitudries.

individual itard to their ce

ta.

t represents th

le value selec

age is a singln the series. ed a measure

mes called ay cluster arou

es of al ncy

Media

QuartileDeciles aPercenti

er Oriented St

m j = 1 to j =

y is a typical hus an averades of the s

ems in the grentral tendenc

he whole grou

cted from a

le value withSince an aveof central va

a ‘measure ound it.”

Positional Averages

n M

es, and iles

tatistical Tech

N; by definit

value aroundage can be usame variable

roup with it acies.

up.”

group of va

hin the rangeerage is somalue.”

of central ten

Mode

hniques

tion,

d which used to e. That

and also

alues to

e of the ewhere

ndency’

The Mean, Median, Mode and Other Measures of Central Tendency 3

Characteristics of a Good Average

An average should be:

1. Rigorously defined,

2. Easy to compute,

3. Capable of simple interpretation,

4. Dependent on all the observed values,

5. Not unduly influenced by one or two extremely large or small values,

6. Should fluctuate relatively little from one random sample or small values,

7. Be capable of mathematical manipulation.

1.4 THE ARITHMETIC MEAN An arithmetic mean is a measure of central tendency and is popularly known as mean. Arithmetic

mean is obtained by dividing the sum of the values of all items of a series by the number of items of that series. Normally, arithmetic mean is denoted by which is read as ‘X bar’. It can be computed for unclassified or ungrouped data or individual series as well as classified or grouped data or discrete or continuous series.

Practical Steps Involved in the Computation of Arithmetic Mean for Unclassified Data

Step 1 Treat the given values of variables as X.

Step 2 Enter the given values in a column headed as X.

Step 3 Add together all the values of variable X and obtain the total i.e., ∑X.

Step 4 Apply the following formula: = ∑

where, = Arithmetic Mean

∑X = Sum of all values of variables X

N = Number of individual observation

1.5 THE WEIGHTED ARITHMETIC MEAN While calculating arithmetic mean, as discussed earlier, equal importance (or weight) is given to

each observation in the data set. However, there are situations in which values of individual observations in the data set are not of equal importance. If such values occur with different frequencies, then computing A.M. of values (as opposed to the A.M. of observations) may not be true representative of the data set characteristic and thus may be misleading. Under these circumstances, we may attach to each observation value a ‘weight’ w1, w2… wn as an indicator of their importance within the data set and compute a weighed mean or average denoted by w as follows:

w = ∑∑

4 Computer Oriented Statistical Techniques

Note: The weighted arithmetic mean should be used

1. when the importance of all the numerical values in the given data set is not equal; 2. when the frequencies of various classes are widely varying; 3. where there is a change either in the proportion of numerical values or in the proportion of their

frequencies; 4. when ratios, percentages orates are being averaged.

1.6 PROPERTIES OF THE ARITHMETIC MEAN 1. The algebraic sum of the deviations of a set of numbers from their arithmetic mean is zero.

2. The sum of squares of deviations of observations is minimum when taken from their arithmetic mean.

3. Arithmetic mean is capable of treated algebraically.

4. If and N1 are the mean and number of observations of a series and and N2 are the corresponding magnitudes of another series, then the mean of the combined series of N1 + N2 observations is given by = ++

5. If a constant B is added (subtracted) from every observation, the mean of these observations also gets added (subtracted) by it.

6. If every observation is multiplied (divided) by a constant b, the mean of these observations also gets multiplied (divided) by it.

7. If some observations of a series are replaced by some other observations, then the mean of original observations will change by the average change in magnitude of the changed observations.

1.7 THE ARITHMETIC MEAN COMPUTED FROM GROUPED DATA

Practical Steps Involved in the Computation of Arithmetic Mean for Discrete Series

Step 1 Treat the given values of variables as X and frequencies as f.

Step 2 Enter the given values of variable X in a column headed as X.

Step 3 Enter the given frequencies f in a column headed as f and obtain the sum of these frequencies i.e. N of ∑f.

Step 4 Multiply the variable of each row with the respective frequency and denote these products by fX and enter the same in a column headed as fX.

Step 5 Obtain the sum of these products i.e. ∑fX.


where, = Arithmetic Mean


∑ = Sum of products of frequency and value of variables X

N = ∑f =Sum of frequencies

Practical Steps Involved in the Computation of Arithmetic Mean for Continuous Series

Step 1 Enter the class intervals in the first column.

Step 2 Calculate the mid-point of each class, denote these mid-points as m and enter the same in a column headed as m.

Note:Mid-point (m) =

Step 3 Enter the given frequencies f in a column headed as f and obtain the sum of these frequencies i.e. N of ∑f.

Step 4 Multiply the mid-point of each row with the respective frequency and denote these products by fm and enter the same in a column headed as fm.

Step 5 Obtain the sum of these products i.e. ∑fm.


Where, = Arithmetic Mean

∑ = Sum of products of mid-points and frequency

N = ∑f = Sum of frequencies

1.8 THE MEDIAN Median is the central value of the variable that divide the series into two equal parts in such a way

that half of the items lie above this value and the remaining half lie below this value. Median is called a positional average because it is based on the position of a given observation in a series arranged in an ascending or descending order and the position of the median is such that an equal number of items lie on either side of it. Median is usually denoted by ‘Med’ or ‘Md’. Median can be computed for both ungrouped data (and individual series) and grouped data (or Discrete/Continuous Series).

Computation of Median for Individual Series

Step 1 Arrange the size of observation in ascending or descending order.

Step 2 Ascertain th observation.

Step 3 Calculate Median as follows:

(a) In case th observation works out to be a whole number.

Median = size or value of th observation in the data array

(b) In case th observation works out to be in fractions,

Median = size or value of full item + 50% of the difference between size of immediate next item and size of full item.


Computation of Median for Discrete Series

Step 1 Arrange the size of observation in ascending or descending order.

Step 2 Calculate Cumulative Frequencies (c.f.)


Step 4 Ascertain the Cumulative Frequency which includes th observation


Median = size or value of the observation corresponding to the cumulative frequency which

includes th observation

Computation of Median for Grouped Data or Continuous Series

Step 1 Calculate Cumulative Frequencies (c.f.)


Step 4 Ascertain the Cumulative Frequency which includes th observation, the corresponding

class frequency (f) and lower limit (L) of that class, the interval between the upper and lower limit of class and cumulative frequency of the preceding class (c.f.).


Median = + . . ×

Where, L = Lower limit of the class

c.f. = Cumulative frequency of the preceding class

f = Frequency of the class

i = Interval between upper and lower limit of class

Note: To find median value by using interpolation, it is assumed that the numerical values of observations are

evenly spaced over the entire class interval.

Merits of Median

1. The median is useful in case of frequency distribution with open-end classes.

2. The median is recommended if distribution has unequal classes.

3. Extreme values do not affect the median as strongly as they affect the mean.

4. It is the most appropriate average in dealing with qualitative data.

5. The value of median can be determined graphically where as the value of mean cannot be determined graphically.

6. It is easy to calculate and understand.

The Me

Deme 1

2

3

4

1.9 M

greatearoundfashionimmedone m

1

2

3

Com

Step 1

Step 2

Step 3

Com

Step 1

Step 2

Step 3

Note

ean, Median, M

erits of Med1. For calcu

need arran

2. Since it iseries.

3. Median is

4. The samp

THE MOD

Mode is oftenst frequency.d which the onable value odiate neighboode or two m

1. Unimoda

2. Bimodal:

3. Multimod

putation o

Count the

2 Ascertain

Mode = V

putation o

Ascertain

2 Ascertain

Mode = V

e: In case of D

determined

Mode and Oth

dian ulating mediangement.

is a positiona

s not capable

pling stability

DE n said to be t. But it is noobservations of distributionourhood. It is

modes or seve

al: A distribut

: A distributio

dal: A distrib

of Mode fo

e number of ti

n the value oc

Value occurrin

of Mode fo

n maximum fr

n the value of

Value of the o

Discrete series

d just by inspect

her Measures o

an it is neces

al average it

for further al

y of the media

that value in ot exactly true

tend to concn because it is usually denoral modes.

tion is said to

on is said to b

bution is said

or Individu

imes the vari

curring the m

ng maximum

or Discrete

requency

f the observati

observation co

(i.e. where val

tion method.

of Central Ten

sary to arran

s value is no

lgebraic calcu

an is less as c

a series whie for every f

centrate most is the value woted by Mo. I

o be Unimoda

be bimodal if

to be multim

ual Series

ous values of

maximum num

m number of ti

e Series

ion correspon

orresponding

lue of observati

ndency

nge the data,

ot determined

ulations.

ompared to m

ich occurs mfrequency dis

heavily. It iwhich has theIt may be no

al if it has onl

f it has two m

modal if it has

f the series re

mber of times

imes.

nding to maxi

g to maximum

ions along with

where as oth

d by all the

mean.

most frequentlstribution. Ras also called greatest freq

oted that a dis

ly one mode.

modes.

s more than tw

epeat themselv

s.

imum frequen

m frequency.

h frequencies ar

her averages

observations

ly or which hather it is tha

the most typquency densitstribution ma

wo modes.

ves.

ncy.

re given), mode

7

do not

s in the

has the at value pical or ty in its ay have

e can be


Computation of Mode for Grouped Data or Continuous Series

Step 1 Ensure that given series is a continuous exclusive series having equal class-intervals. If the given series is not a continuous exclusive series, follow the procedure suggested below:

Given Series Procedure to be followed

Less than series Convert into continuous exclusive series

More than series Convert into continuous exclusive series

Inclusive series Convert into continuous exclusive series

Having unequal class intervals

Make the class intervals equal and adjust the frequencies assuming that they are equally distributed throughout the class.

Step 2 Ascertain the modal class as follows:

(a) By preparing the Grouping Table and Analysis in case there is a small difference between the maximum frequency and the frequency preceding it or succeeding it.

(b) By inspection in other cases. In his case the class with maximum frequency is the Modal Class.

Step 3 Calculate the Mode as follows:

1. By inspection formula in case of Unimodal distribution (i.e. where there is single mode)

(a) Where the modal class is one having the maximum frequency

Mode = = + │ ││ – │ ×

Where, L = Lower limit of the Modal Class f1 = Frequency of the Modal Class f0 = Frequency of the pre-modal class i.e. preceding the modal class f2 = Frequency of the post-modal class i.e. succeeding the modal class I = Class interval of Modal Class

Notes:

1. If Modal Class id the first class, f0 is taken as zero. 2. If Modal Class id the last class, f2 is taken as zero. 3. Where the Modal Class is other than the one having the maximum frequency

2. By Empirical relationship formula in case of bimodal or multimodal distribution (i.e. where there are two or more values having the same maximum frequency)

Mode = 3 Median – 2 Mean

Merits of Mode 1. It is easy to calculate and simple to understand.

2. It is not affected by the extreme values.

3. The value of mode can be determined graphically.

4. Its value can be determined in case of open-end class interval.

5. The mode is the most representative of the distribution.

The Me

Deme 1

2

3

4

5

1.10If

set is numer

Ifsaid tosugges

M

O

Iffigure are corepreseand mmeasu

M

Bare conmean m

M

ean, Median, M

erits of Mod1. It is not su

2. The value

3. The value

4. The mode

5. It is difficis zero.

0 THE EMP

f values of msymmetrical

rical values in

f most of the o be skewed. sted by Karl P

Mean – Mode

OR Mode =

f most of the (b), then it is

oncentrated menting highes

mean more to tures will be

Mean > Media

But if the distncentrated mmove to the l

Mean < Media

Median = Me

(a) Symm

Mode and Oth

de uitable for fu

e of mode can

e of mode is n

e is strictly de

cult to calcula

IRICAL RELA

mean, median as shown in

n the data set

values fall eIn such case

Pearson is as

= 3 (Mean –

3 (Median –

values of obss said to be skmore to the rst frequency)the right (valu

an > Mode

tribution is skmore to the lef

left of mode.

an < Mode

an = Mode

etrical

her Measures o

urther mathem

nnot always b

not based on

efined.

ate when one

ATION BETW

and mode arn the figure (a

is not symme

either to the res, a relationsfollows:

– Median)

2 Mean)

servations in kewed to the rright of the ) but the medue that is affe

kewed to the ft of the modeThe order of

M

of Central Ten

matical treatm

be determined

each and eve

e of the observ

WEEN THE Mre equal, thena). But, if thetrical as show

right or to theship between

a distributionright or positmode). In thdian (value tected by extr

left or negate), then mode

f magnitude o

Mode Median

(b) Skewed to t

ndency

ments.

d.

ery item of the

vations is zer

EAN, MEDIA

n distributionhese values arwn in figure

e left of the mn these three

n fall to the rtively skewedhis case, modthat depends eme values).

ively skewede is again undf these measu

Mean

the Right

e series.

ro or the sum

AN AND MO

n of numericare not equal (b) and figure

mode, then sumeasures of

ight of the md (i.e. values ode remains uon the numbThe order of

d (i.e. values der the peak wures will be

Mean

(c) Sk

m of the obser

ODE al values in ththen distribue (c).

uch a distribucentral tende

mode as shownof higher magunder the peaber of observf magnitude o

of lower magwhereas medi

Median Mod

kewed to the Lig

9

rvations

he data ution of

ution is ency as

n in the gnitude ak (i.e.

vations) of these

gnitude ian and

de

ght


In both the cases, the difference between mean and mode is three times the difference between mean and median.

In general, for a single mode skewed distribution (non-symmetrical), the median is preferred to the mean for measuring location because it is neither influenced by the frequency of occurrence of a single observation value as mode nor it is affected by extreme values.

1.11 THE GEOMETRIC MEAN (G.M.) In many business and economics problems, such as calculation of compound interest and

inflation, quantities (variables) change over a period of time. In such cases, a decision maker may like to know an average percentage change rather than simple average value to represent the average growth or declining rate in the variable value over a period of time. Thus, another measure of central tendency called geometric mean (G.M.) is calculated.

For example, consider the annual growth rate of output of accompany in the last five years.

Year Growth Rate (Percent) Output at the end of the Year

2006 5.0 105.00

2007 7.5 112.87

2008 2.5 115.69

2009 5.0 121.47

2010 10.0 133.61

The simple arithmetic mean of the growth rate is

= (5 + 7.5 + 2.5 + 5 + 10) = 6

This value of mean implies that if 65 percent is the growth rate, then output at the end of year 2012 should be 133.81, which is slightly more than the actual value, 133.61. Thus the correct growth rate should be less than 6.

To find the correct growth rate, we apply the formula of geometric mean:

G.M. = √Product of all the n values

= √ 1 ∙ 2 ∙ … . = (X1·X2·X3…..Xn)1/n

In other words, G.M. of a set of n observations is the nth root of their product.

Substituting the values of growth rate in the given formula, we have

G.M. = √5 × 7.35 × 2.5 × 5 × 10 = √4687.5 = 5.9 percent average growth.

Computation of Geometric Mean for Individual Series

If the number of observations are more than three, then G.M. can be calculated by taking logarithm on both the sides of the equation. The formula for G.M. for un-grouped data can be expressed in terms of logarithms as shown below:

Log (G.M.) = log (X1·X2·…Xn)

= (logX1 + logX2 +…. + logXn) = ∑ i


and therefore G.M. = Antilog { ∑ i }

or G.M. = Antilog [∑

] where, N = Total no. of items

Computation of Geometric Mean for Discrete Series

If the observations X1, X2,…, Xn occur with frequencies f1, f2,…, fn, respectively, and the total frequencies are, n = ∑ i then the G.M. foe such data is given by

log (G.M.) = {f1 logX1 + f2 logX2 + …+ fn logXn}

= ∑ ( )

G.M. = Antilog { ∑fi logXi}

OR G.M. = Antilog [∑

] where, N=Total no. of items

Computation of Geometric Mean for Grouped Data or Continuous Series

Step 1 Calculate the mid-points of each class and enter these mid-points in the column headed as ‘m’

Step 2 Take the logarithms of each mid-point and enter in the column headed as log m.

Step 3 Multiply these logarithms (log m) with the respective frequencies and enter these products (f log m) in the column headed as f log m and then obtain their total i.e. ∑f log m.

Step 4 Calculate Geometric Mean as follows:

G.M. = Antilog [ ∑

]

Weighted Geometric Mean

Like the weighted Arithmetic Mean, Weighted Geometric Mean may be calculated. Symbolically,

G.M.W = × …

Computation of Weighted Geometric Mean

Step 1 Take the logarithms of each item of variable X and enter in the column headed as log X.

Step 2 Multiply these logarithms (log X) with the respective weights (W) and enter these products (W log X) in the column headed as W log X and then obtain their total i.e. ∑W log X.

Step 3 Calculate Geometric Mean as follows:

G.M. = Antilog [ ∑ ∑ ]

Uses of Geometric Mean

(i) Geometric Mean is used to find the average percentage in sales, production etc.

(ii) Geometric Mean is used to find the index numbers since it shows the relative change.


(iii) When large weights are given to small items and small weights are given to large items, the best measure of central tendency is Geometric Mean. That is, when there are extreme values, the best measure of central tendency to be used is Geometric Mean.

Merits of Geometric Mean

(i) Geometric Mean is calculated based on all observations in the series.

(ii) Geometric Mean is clearly defined.

(iii) Geometric Mean is not affected by extreme values in the series.

(iv) Geometric Mean is amenable to further algebraic treatment.

(v) Geometric Mean is useful in averaging ratios and percentages.

Demerits of Geometric Mean

(i) Geometric Mean is difficult to understand.

(ii) We cannot compute geometric mean if there are both positive and negative values occur in the series.

(iii) We cannot compute geometric mean if one or more of the values in the series is zero.

1.12 THE HARMONIC MEAN (H.M.) The harmonic mean (H.M) is defined as the reciprocal of the arithmetic mean of the reciprocal of

the individual observations.

H.M. = ( … ) Where, X1, X2… Xn refer to the value of various items of the series

N = Total number of items of the series

Computation of Harmonic Mean for Individual Series

Step 1 Calculate the reciprocals of each item of variable X and enter in the column headed as and

obtain their total i.e. ∑

Step 2 Calculate H.M. as follows: H.M. = ∑( ) Computation of Harmonic Mean for Discrete Series

Step 1 Calculate the reciprocals of each item of variable X and enter in the column headed as

Step 2 Multiply these reciprocals ( ) with the respective frequencies and enter these products ( )

in the column headed as and then obtain their total i.e. ∑( )

Step 3 Calculate Harmonic Mean as follows: H.M. = ∑( )


Computation of Harmonic Mean for Grouped Data or Continuous Series

Step 1 Calculate the mid-point of each item of variable X and enter these mid-points in column headed as m.

Step 2 Calculate the reciprocals of the mid-points and in the column headed as .

Step 2 Multiply these reciprocals ( ) with the respective frequencies and enter these products ( )

in the column headed as and then obtain their total i.e. ∑( )

Step 3 Calculate Harmonic Mean as follows: H.M. = ∑( ) Weighted Harmonic Mean

Like the weighted Arithmetic Mean, Weighted Harmonic Mean may be calculated. Symbolically,

G.M. = ∑∑( )

Uses of harmonic Mean

(i) The H.M is used for computing the average rate of increase in profits of a concern.

(ii) The H.M is used to calculate the average speed at which a journey has been performed.

Merits of Harmonic Mean

(i) Its value is based on all the observations of the data.

(ii) It is less affected by the extreme values.

(iii) It is suitable for further mathematical treatment.

(iv) It is strictly defined.

Demerits of Harmonic Mean

(i) It is not simple to calculate and easy to understand.

(ii) It cannot be calculated if one of the observations is zero.

(iii) The H.M is always less than A.M and G.M.

1.13 THE RELATION BETWEEN ARITHMETIC, GEOMETRIC AND HARMONIC MEANS (a) For any finite number of positive values of a variable, A.M. ≥ G.M. ≥ H.M.

Proof: We shall prove it in case of two positive numbers. Let x1 and x2 be the two positive numbers.

Now, A.M. of x1 and x2 =

, their G.M. = √ × and their H.M. = (√ − √ ) ≥ 0 (Since square of a real number is non-negative) ≫ ( ) + ( ) − 2 ≥ 0 ≫ + ≥ 2

14 Computer Oriented Statistical Techniques ≫ + 2 ≥ ≫ A. M. ≥ G. M. … (I)

Again, √ − √ ≥ 0

≫ + − √ ≥ 0 ≫ ≥ 2 +

≫ G. M. ≥ H. M …(II)

From Eq. (1) and (2), we get

A.M. ≥ G.M. ≥ H.M.

(b) For any two positive numbers, A.M. × H.M. = (G.M.)2.

Proof: Let, a and b be the two positive numbers, we have A. M. = a + b2 , G. M. = √ab H. M. = 2+ ( ) = 2aba + b

( . . ) × ( . . ) = + × + = = ( . . )

1.14 THE ROOT MEAN SQUARE The root mean square (RMS), or quadratic mean, of a set of numbers X1, X2, ... , XN is sometimes

denoted by and is defined by

RMS = = ∑ = ∑

This type of average is frequently used in physical applications.

Example: The RMS of the set 1, 3, 4, 5, and 7 is = √20 = 4.47

1.15 QUARTILES, DECILES AND PERCENTILES If a set of data is arranged in order of magnitude, the middle value (or arithmetic mean of the two

middle values) that divides the set into two equal parts is the median. By extending this idea, we can think of those values which divide the set into four equal parts. These values denoted by Q1, Q2, and Q3, are called the first, second, and third quartiles, respectively, the value Q2 being equal to the median. Similarly, the values that divide the data into 10 equal parts are called deciles and are denoted by D1, D2,..., D9, while the values dividing the data into 100 equal parts are called percentiles and are denoted by P1, P2,... , P99. The fifth decile and the 50th percentile correspond to the median. The 25th


and 75th percentiles correspond to the first and third quartiles, respectively. Collectively, quartiles, deciles, percentiles, and other values obtained by equal subdivisions of the data are called quantiles.

1.16 SOFTWARE AND MEASURES OF CENTRAL TENDENCY The output for all five packages is given for the test scores:

Test Scores

25 28 28 28 29 30 32 33 33 33 34 34 35 36 37

38 41 42 42 45 46 47 51 51 53 53 53 55 56 57

57 60 61 62 62 62 67 68 69 71 72 73 73 75 75

79 82 85 86 86 86 88 88 89 91 93 94 96 96 99

EXCEL

If the pull-down ‘‘Tools => Data Analysis => Descriptive Statistics’’ is given, the measures of central tendency median, mean, and mode as well as several measures of dispersion are obtained:

Mean 59.16667

Standard Error 2.867425

Median 57

Mode 28

Standard Deviation 22.21098

Sample Variance 493.3277

Kurtosis 1.24413

Skewness 0.167175

Range 74

Minimum 25

Maximum 99

Sum 3550

Count 60

MINITAB

If the pull-down ‘‘Stat=> Basic Statistics => Display Descriptive Statistics’’ is given, the following output is obtained:

Descriptive Statistics: testscore

Variable N N* Mean SE Mean St Dev Minimum Q1 Median Q3

Testscore 60 0 59.17 2.87 22.21 25.00 37.25 57.00 78.00

Variable Maximum

testscore 99.00


SPSS

If the pull-down ‘‘Analyze => Descriptive Statistics => Descriptives’’ is given, the following output is obtained:

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

Testscore valid N (listwise)

60 60

25.00 99.00 59.1667 22.21098

SAS

If the pull-down ‘‘Solutions =>Analysis => Analyst’’ is given and the data are read in as a file, the pull-down ‘‘Statistics => Descriptive => Summary Statistics’’ gives the following output:

STATISTIX

If the pull-down ‘‘Statistics =>Summary Statistics => Descriptive Statistics’’ is given in the software package STATISTIX, the following output is obtained:

SOLVED EXAMPLES Example 1: Write out the terms in each of the following indicated sums:

(a) ∑ (b) ∑ − 3 (c) ∑ (d) ∑ (e) ∑ −

Solution: (a) + + + + +

(b) ( − 3) + ( − 3) + ( − 3) + ( − 3)

(c) + + + ⋯ + =

(d) + + + +

(e) ( − )+) ( − )+) ( − ) = + + − 3a Example 2: Express each of the following by using the summation notation:

(a) X + X + X + ⋯ + X

(b) (X + Y ) + (X + Y ) + ⋯ + (X + Y )

(c) f X + f X + ⋯ + f X

(d) a b + a b + a b + ⋯ + aNbN

(e) f X Y + f X Y + f X Y + f X Y

Solution: (a) ∑ X

(b) ∑ X + Y

(c) ∑ f X


(d) ∑ a bN

(e) ∑ f X Y

Example 3: Calculate the arithmetic mean of the following observations.

32, 35, 36, 37, 39, 41, 43, 47, 48

Solution: A.M. = = = 39.77

Example 4: In a survey of 5 cement companies, the profit (in ` crore) earned during a year was 15, 20, 10, 35 and 32. Find the arithmetic mean of the profit earned.

Solution: A.M. = = 22.4

Thus, the arithmetic of the profit earned by these companies during a year was ` 22.4 crore.

Example 5: An examination was held to decide for awarding of a scholarship. The weights of various subjects were different. The marks obtained by 3 candidates (out of 10 in each subject) are given below:

Subject Weight Students

A B C

Mathematics 4 60 57 62

Physics 3 62 61 67

Chemistry 2 55 53 60

English 1 67 77 49

Calculate the weighted A.M. to award the scholarship.

Solution: The calculation of the weighted arithmetic mean is shown below:

Subject Weight (wi)

Students

Student A Student B Student C

Marks (Xi) Xiwi Marks (Xi) Xiwi Marks (Xi) Xiwi

Mathematics 4 60 240 57 228 62 248

Physics 3 62 186 61 183 67 201

Chemistry 2 55 110 53 106 60 120

English 1 67 67 77 77 49 49

10 244 603 248 594 238 618

Applying the formula for weighted mean, we get:

wA = = 60.3; A = = 61

wB = = 59.4; B = = 62

wC = = 61.8; C = = 59.3


From above calculations, it may be noted that student B should get the scholarship as per simple A.M. values, but according to weighted A.M., student C should get the scholarship because all the subjects of examination are not of equal importance.

Example 6: The owner of a general store was interested in knowing the mean contribution (sales price minus variable cost) of his stock of 5 items. The data is given below:

Product Contribution per Unit Quantity Sold

1 6 160

2 11 60

3 8 260

4 4 460

5 14 110

Solution: If the owner ignores the values of the individual products and gives equal importance to each product, then the mean contribution per unit sold will be

= (1/5) 6 + 11 + 8 + 4 + 14 = ` 8.6

However, ` 8.60 may not necessarily be the mean contribution per unit of different quantities of the products sold. In this case, the owner has to take into consideration the number of units of each product sold as different weights. Computing weighted A.M. by multiplying units sold (w) of a product by its contribution (X). That is,

= ( ) ( ) ( ) ( ) ( )

= ,, = ` 6.74

This value, ` 6.74, is different from the earlier value, ` 8.60. The owner must use the value ` 6.74 for decision making purpose.

Example 7: Find the mean from the following data:

X 5 10 15 20 25 30 35 40

f 5 9 13 21 2 15 8 3

Solution: Total Frequency = ∑f = 5+9+13+21+2+15+8+3

= 76 = Number of values

X f fX

5 5 25

10 9 90

15 13 195

20 21 420

25 2 50

30 15 450

35 8 280

40 3 120

∑f = 76 ∑fX = 1630


∑fX = Sum of the products of X values with their respective frequencies.

= Sum of the values = 1630

Arithmetic Mean = ∑∑ = = 21.44

Example 8: If A, B, C and D are four chemicals costing ` 15, ` 12, ` 8 and ` 5 per 100g, respectively, and are contained in a given compound in the ratio of 1, 2, 3 and 4 parts, respectively, then what should be the price of the resultant compound.

Solution: A.M. = ∑∑ =

× × × × = ` 8.30

Example 9: The daily earning (in rupees) of 175 employees working on a daily basis in a firm are:

Daily Earnings (`) 100 120 140 160 180 200 220

Number of Employees 3 6 10 15 24 42 75

Calculate the average daily earning for all employees by assumed mean method.

Solution: Let us take assumed mean, A = 160.

The calculation of average daily earning for employees is shown below:

Daily Earnings (in `) (Xi)

Number of Employees (fi)

di = Xi – A = Xi - 160

fi di

100 3 -60 -180

120 6 -40 -240

140 10 -20 -200

160 15 0 0

180 24 20 480

200 42 40 1680

220 75 60 4500

∑f = 175 ∑fd = 6040

The required A.M. ( ) using the formula is given by: = A + ∑ = 160 + 6040/175 = ` 194.51

Thus, the average daily earning for all employees is ` 194.51

Example 10: A company is planning to improve plant safety. For this, accident data for the last 50 weeks was complied. These data are grouped into the frequency distribution as shown below. Calculate the A.M. of the number of accidents per week.

Number of accidents 0-10 10-20 20-30 30-40 40-50

Number of weeks 6 20 10 8 2

Solution: The calculation of Arithmetic Mean is shown below:

Date post:	18-Mar-2020
Category:	Documents
Upload:	others
View:	10 times
Download:	0 times

COMPUTER ORIENTED · Range, The 10-90 Percentile Range, The Standard Deviation, The Variance, Short...

Documents