Basics of Stats

8/4/2019 Basics of Stats

1/47

Business Statistics


2/47

Definition

Statistics is a standard method for

collecting, organizing, summarizing,presenting, and analyzing and interpretingdata for drawing conclusions and makingdecisions based upon the analyses of these

data. Statistics are used extensively by

engineers, managers, govt, businessmen,etc throughout the world.


3/47

Collection of data

Types of data Secondary data

Whether data are suitable?

Whether data are adequate?

Whether data are reliable?

Primary data Questioning

observation


4/47

Presentation of data Classification

Geographical C

hronological Quantitative Qualitative

Frequency distribution Classification according to class interval

Class limits Exclusive method Inclusive method

Class intervals class frequency


5/47

Tabulation of data Parts of table

Charting of data Bar Diagrams

Pie diagrams

Line graphs

Histograms

Frequency polygon


6/47

Functions of Statistics Presents facts in a definite form

Simplifies mass of figures

Facilitates comparison

Helps in formulating and testinghypothesis

Helps in prediction Helps in the formulation of suitable

policies.


7/47

Populations and SamplesA population is a complete set of all of

the possible instances of a particularobject

for example, students in this College.

A sample is a subset of the population

for example, any one of the classes.

We use samples to draw conclusionsabout the parent population.


8/47

Measures ofCentral Tendency If you have to declare a single value to

represent a population or a sample, what do

you use? The most common value is the mean, also

called the average or the expected value.

Another common value is the mode or themost likely (most common) value.

Another value is the median or the middleof the data set.


9/47

Measures ofCentral Tendency

(ungrouped) Mean

This is the mathematical average of a set of numbers

Median This is the middle value of a set of data that has been

arranged from lowest to highest

Mode

The value that occurs the most in a set of data

We can use expenditure as a good way of discussing thesethree measures. If we wanted to know the averageexpenditure of NIFT students. Lets take random samples of monthly expenditure of NIFT students.


10/47

What is the Mean? The mean is the sum of all of the

values in the data set divided by thenumber of values.

The equation for calculating the mean isthe same for both samples and

populations.

!n

xx

Mean


11/47

Sample Mean

Where:

X-bar is the mean

xi are the data points

n is the sample size

!

!n

i

ix

n

x

1

1


12/47

Population Mean

Where:

is the population mean

xi are the data points

N is the total number of observations in the

population

!

!N

i

ixN

1

1Q


13/47


(ungrouped) The sample gives

these values:

5000, 6000, 30000,110000, 15000,6000, 17000, 13000,12000, 11000,

8000, 6000, 15000,6000, 11500

The Mean

This is the

average. Sum of values =

271500

Total N = 15 Mean = 18100


14/47

What is the Median? If the data has been sorted (ascending or

descending), the median is the middle

value (for an odd number of points) or theaverage of the two middle values (for aneven number of points).

median is used to characterize data sets

with a few extreme values that distort therelevance of the mean, such as housevalues or family incomes.

Median = th item in the data array( )n + 1

2


15/47



these values:

5000, 6000, 30000,110000, 15000,6000, 17000, 13000,

12000, 11000,8000, 6000, 15000,6000, 11500

The Median

This is the middlevalues:

5000, 6000, 6000,6000, 6000, 8000,11000, 11500, 12000,13000, 15000, 15000,

17000, 30000, 110000 The median here is

11500

In cases where there

are two middle values,we avera e the two.


16/47

What is the Mode? If the data is discrete, or has been grouped

into discrete intervals, the mode is that value

that occurs the most often. In other words it is the value most likely to

occur.


17/47



these values:

5000, 6000, 30000,110000, 15000,6000, 17000, 13000,12000, 11000,8000, 6000, 15000,6000, 11500

The Mode

This is the mostnumerous value:

5000, 6000, 6000,6000, 6000, 8000,11000, 11500, 12000,13000, 15000, 15000,

17000, 30000, 110000 The Mode here is 6000.

Sometimes there is nomodeor even two

modes!


18/47


(ungrouped) So given these

values

5000, 6000, 6000,6000, 6000, 8000,11000, 11500,12000, 13000,15000, 15000,17000, 30000,

110000

what is the bestmeasure of central

tendency for thisrandom sample ofNIFT students?

Mean?...18100

Median?...11500

Mode?...6000


19/47

What Is the Range? range: the distance between the

lowest and the highest values in theset.

For example, the time to drive toChurchgate is 2-hours plus or minus 15

minutes. Or, 105 to 135 minutes. Thusthe range is 30 minutes.


20/47

Measures ofDispersion or Spread

(ungrouped) Range

The highest value minus the lowest value.

From our last example, the range would be:110000 5000 = 105000


21/47

What is the Variance? The Variance of a population is the sum of

the squares of the differences between the

mean and the individual data points dividedby the number of data points.

The Variance of a sample is the sum of thesquared differences divided by the number of

data points less one.


22/47

What is the Standard

Deviation? Standard Deviation

This is the average distance yourvalues have from the meanscore.

The Standard Deviation is the squareroot of the variance


23/47

Computing Standard Deviation Population

Sample "s"

2

1

)(1 QW !

!N

i

ix

N

2

1

)()1(

1xx

ns

n

i

i!

!

It is important that you

recognize the difference

between these two

equations!

The expression under

the square root sign isthe variance


24/47

Measures ofDispersion or Spread

(ungrouped)Standard Deviation

Lets return to our NIFTrandom sample

5000, 6000, 6000, 6000, 6000,8000, 11000, 11500, 12000,13000, 15000, 15000, 17000,30000, 110000

Follow the steps on the rightwhile we calculate the standarddeviation as a class on theboard

1. Calculate themeanwhich is 18100

2.

Find the distance thateach value has from themean

3. Square the distance

4. Add up these distancesand divide by thesample size 1

5. Then we get the squareroot of this number


25/47

Standard DeviationX Mean (x-bar) X x-bar (X x-bar)2

5000 18100 -13100 17161 + E4

6000 18100 -12100 14641 + E4

6000 18100 -12100 14641 + E4

6000 18100 -12100 14641 + E4

6000

18100 -

12100

14641 + E48000 18100 -10100 10201 + E4

11000 18100 -7100 5041 + E4

11500 18100 -6600 4356 + E4

12000 18100 -6100 3721 + E4

13000 18100 -5100 2601 + E4

15000 18100 -3100 961 + E4

15000 18100 -3100 961 + E4

17000 18100 -1100 121 + E4

30000 18100 11900 14161 + E4

110000 18100 91900 844561 + E4


26/47

Standard Deviation We sum (x x-bar)2, and get the square root

of this sum. This is the standard deviation.What is the square root of the sum?

Appx. 26,219


27/47

The Subtle Difference

Between S and The difference in the divisors (N versus n-

1) results in S being slightly larger than

.

This is to account for the fact that S(from a sample) is an estimate of the

(of a population) and this adds a degreeof error to the value.

Note: for large n the difference is trivial.


28/47

A Valuable Tool The standard deviation is a rather

recent invention and was originally

devised by Gauss to explain the errorobserved in measured star positions.

Today it is used in everything from

Quality Control to Measuring Risk infinancial investments.


29/47

Measures ofCentral Tendency and Dispersion

(Grouped Data)

Remember that grouped data is a collectionof data that has been placed into categories

Thus we need to calculate the mean andstandard deviation differently, but the idea is

the same.


30/47

A.M for Grouped DataThe following are thefrequency distribution

of 500 workersaccording to theirweekly income (in Rs.)Find the average

income.

Income Persons

0 50 90

50 100 150

100 150 100

150 200 80200 250 70

250 - 300 10


31/47

A.M for Grouped Data

Income Persons Mid values Deviations

fx d

0 50 90 25 - 2 -180

50 100 150 75 - 1 -150

100 150 10 125 0 0

150 200 80 175 1 80200 250 70 225 2 140

250 - 300 10 275 3 30

Total 500 -80


32/47

A.M for Grouped Data

117.50

500

80125

1

1

Rsx

hx

f

dXf

Ax n

i

i

n

i

ii

!

!

!

!

!


33/47

Advantages /Disadvantages of theArithmetic Mean

Advantages:

1) Familiar and intuitively clear to mostpeople

2) Every data set has one and only one mean

3) Useful for performing statisticalprocedures

Disadvantages:

1) May be affected by extreme values

2) Tedious to compute

3) Difficult to compute for data set withopen- ended classes


34/47

Computation ofMean, Median, and Mode for grouped Data

Age in

(Yrs)

id

Value(x)d=(X-A)/h

No. of

Pts(f).fxd

Cummulative

Frequency

10 - 20 15 -3 5 -15 520 - 30 25 -2 19 -38 24

30 - 40 35 -1 26 -26 50

40 - 50 45 0 35 0 85

50 - 60 55 1 15 15 100

60 - 70 65 2 3 6 103Total 103 -58

Arithmetic ean = 45 + (-58)/103 X10 = 39.4


35/47

Computation ofMean, Median, and Mode for grouped Data

Median = LC F

FXh

( / . .)2

where L is lower limit of Median Class; N is total FrequencC.F. id cumulative frequency of class preceding median class, F is frequency of median clasand h is class width.N/2 = 103/2 = 51.5 This value lies in the class interval 40-50 (This value is seen from thecumulative frequency column). Hence L=40

Median = 40+ (51.2 -50)/ 35 x10 = 40.34


36/47


37/47

Comparing the Mean, Median,

and Mode

Mean Mode

Median

Mode Mean

Median


38/47

Summary ofCentral Tendency Measures

Measure Equation Description

Mean 7x /n Balance Point

Median (n+1) th item in

array2

Middle value in

ordered array

Mode none Most frequent


39/47

Standard Deviation (Grouped data)

hf

df

f

dfDS v

v

v

!

22

.

Where f is frequency; d is deviation computed

as

di=

x

h

i


40/47

SD for Grouped DataThe following dataprovides the chest

measurement in Cms.Of 50 MBBS students.

Find Mean and SD

Chest

Measurement

(Cms)

No. of

Students

61 70 2

71 80 10

81 90 20

91 100 17

101 - 110 1


41/47

S.D for Grouped DataC I Mid

Values(x)Fr.(f)

D= f x d f X d2

61 71

65.5 2 -2 -4 871 -80 75.5 10 -1 -10 10

81 90 85.5 20 0 0 0

91 100 95.5 17 1 17 17100 110 105.5 1 2 2 4

Total 5 39

h

ax


42/47

S.D for Grouped Data

86.8

5.86

2

1

1

1

1

2

1

1

!

!

!!

!

!

!

!

!

!

hx

f

xdf

f

xdf

hx

f

xdf

Ax

n

i

i

n

iii

n

i

i

n

iii

n

i

i

n

i

ii

W


43/47

Uses of Standard Deviation

Aside from measure of dispersion...

Determines where values offrequency distribution are in relationto mean (standard scores)

Measures percentage of items within

specific ranges Chebyshevs Theorem

Normal distribution


44/47

Coefficient of Variation

1.Measure of relative dispersion

2.Always a % 3.Shows variation relative to mean

4.Used to compare 2 or more groups

Sample PopulationCV

s

x! (100) CV ! (100)Q

W_


45/47

Coefficient of Variation

ExampleWhich technician shows more variability?

Qa!40

Wa!5Qb!160

Wb!15


46/47

Q

Solution

CV ! (100)W

Technician B5

40(100)=

= 12.5%

Technician A

15

160(100)=

= 9.4%


47/47

Summary ofVariation Measures

Measure Equation Description

Range xlargest - xsmallest Total Spread

Interquartile RangeQ

3- Q

1Spread ofMiddle 50%

Standard Deviation

(Sample)x

n

2

1

Dispersion aboutSample Mean

Standard Deviation

(Population)

x

N

Q2 Dispersion about

Population Mean

Variance

(Sample)7(x )

2

n 1

Squared Dispersionabout Sample Mean

Coeff. of Variation s / (100) Relative Variation

x_

x_

x_

Date post:	07-Apr-2018
Category:	Documents
Upload:	abhisheksinha
View:	220 times
Download:	0 times

Basics of Stats

Documents