Date post: | 07-Apr-2018 |
Category: |
Documents |
Upload: | abhisheksinha |
View: | 220 times |
Download: | 0 times |
of 47
8/4/2019 Basics of Stats
1/47
Business Statistics
8/4/2019 Basics of Stats
2/47
Definition
Statistics is a standard method for
collecting, organizing, summarizing,presenting, and analyzing and interpretingdata for drawing conclusions and makingdecisions based upon the analyses of these
data. Statistics are used extensively by
engineers, managers, govt, businessmen,etc throughout the world.
8/4/2019 Basics of Stats
3/47
Collection of data
Types of data Secondary data
Whether data are suitable?
Whether data are adequate?
Whether data are reliable?
Primary data Questioning
observation
8/4/2019 Basics of Stats
4/47
Presentation of data Classification
Geographical C
hronological Quantitative Qualitative
Frequency distribution Classification according to class interval
Class limits Exclusive method Inclusive method
Class intervals class frequency
8/4/2019 Basics of Stats
5/47
Tabulation of data Parts of table
Charting of data Bar Diagrams
Pie diagrams
Line graphs
Histograms
Frequency polygon
8/4/2019 Basics of Stats
6/47
Functions of Statistics Presents facts in a definite form
Simplifies mass of figures
Facilitates comparison
Helps in formulating and testinghypothesis
Helps in prediction Helps in the formulation of suitable
policies.
8/4/2019 Basics of Stats
7/47
Populations and SamplesA population is a complete set of all of
the possible instances of a particularobject
for example, students in this College.
A sample is a subset of the population
for example, any one of the classes.
We use samples to draw conclusionsabout the parent population.
8/4/2019 Basics of Stats
8/47
Measures ofCentral Tendency If you have to declare a single value to
represent a population or a sample, what do
you use? The most common value is the mean, also
called the average or the expected value.
Another common value is the mode or themost likely (most common) value.
Another value is the median or the middleof the data set.
8/4/2019 Basics of Stats
9/47
Measures ofCentral Tendency
(ungrouped) Mean
This is the mathematical average of a set of numbers
Median This is the middle value of a set of data that has been
arranged from lowest to highest
Mode
The value that occurs the most in a set of data
We can use expenditure as a good way of discussing thesethree measures. If we wanted to know the averageexpenditure of NIFT students. Lets take random samples of monthly expenditure of NIFT students.
8/4/2019 Basics of Stats
10/47
What is the Mean? The mean is the sum of all of the
values in the data set divided by thenumber of values.
The equation for calculating the mean isthe same for both samples and
populations.
!n
xx
Mean
8/4/2019 Basics of Stats
11/47
Sample Mean
Where:
X-bar is the mean
xi are the data points
n is the sample size
!
!n
i
ix
n
x
1
1
8/4/2019 Basics of Stats
12/47
Population Mean
Where:
is the population mean
xi are the data points
N is the total number of observations in the
population
!
!N
i
ixN
1
1Q
8/4/2019 Basics of Stats
13/47
Measures ofCentral Tendency
(ungrouped) The sample gives
these values:
5000, 6000, 30000,110000, 15000,6000, 17000, 13000,12000, 11000,
8000, 6000, 15000,6000, 11500
The Mean
This is the
average. Sum of values =
271500
Total N = 15 Mean = 18100
8/4/2019 Basics of Stats
14/47
What is the Median? If the data has been sorted (ascending or
descending), the median is the middle
value (for an odd number of points) or theaverage of the two middle values (for aneven number of points).
median is used to characterize data sets
with a few extreme values that distort therelevance of the mean, such as housevalues or family incomes.
Median = th item in the data array( )n + 1
2
8/4/2019 Basics of Stats
15/47
Measures ofCentral Tendency
(ungrouped) The sample gives
these values:
5000, 6000, 30000,110000, 15000,6000, 17000, 13000,
12000, 11000,8000, 6000, 15000,6000, 11500
The Median
This is the middlevalues:
5000, 6000, 6000,6000, 6000, 8000,11000, 11500, 12000,13000, 15000, 15000,
17000, 30000, 110000 The median here is
11500
In cases where there
are two middle values,we avera e the two.
8/4/2019 Basics of Stats
16/47
What is the Mode? If the data is discrete, or has been grouped
into discrete intervals, the mode is that value
that occurs the most often. In other words it is the value most likely to
occur.
8/4/2019 Basics of Stats
17/47
Measures ofCentral Tendency
(ungrouped) The sample gives
these values:
5000, 6000, 30000,110000, 15000,6000, 17000, 13000,12000, 11000,8000, 6000, 15000,6000, 11500
The Mode
This is the mostnumerous value:
5000, 6000, 6000,6000, 6000, 8000,11000, 11500, 12000,13000, 15000, 15000,
17000, 30000, 110000 The Mode here is 6000.
Sometimes there is nomodeor even two
modes!
8/4/2019 Basics of Stats
18/47
Measures ofCentral Tendency
(ungrouped) So given these
values
5000, 6000, 6000,6000, 6000, 8000,11000, 11500,12000, 13000,15000, 15000,17000, 30000,
110000
what is the bestmeasure of central
tendency for thisrandom sample ofNIFT students?
Mean?...18100
Median?...11500
Mode?...6000
8/4/2019 Basics of Stats
19/47
What Is the Range? range: the distance between the
lowest and the highest values in theset.
For example, the time to drive toChurchgate is 2-hours plus or minus 15
minutes. Or, 105 to 135 minutes. Thusthe range is 30 minutes.
8/4/2019 Basics of Stats
20/47
Measures ofDispersion or Spread
(ungrouped) Range
The highest value minus the lowest value.
From our last example, the range would be:110000 5000 = 105000
8/4/2019 Basics of Stats
21/47
What is the Variance? The Variance of a population is the sum of
the squares of the differences between the
mean and the individual data points dividedby the number of data points.
The Variance of a sample is the sum of thesquared differences divided by the number of
data points less one.
8/4/2019 Basics of Stats
22/47
What is the Standard
Deviation? Standard Deviation
This is the average distance yourvalues have from the meanscore.
The Standard Deviation is the squareroot of the variance
8/4/2019 Basics of Stats
23/47
Computing Standard Deviation Population
Sample "s"
2
1
)(1 QW !
!N
i
ix
N
2
1
)()1(
1xx
ns
n
i
i!
!
It is important that you
recognize the difference
between these two
equations!
The expression under
the square root sign isthe variance
8/4/2019 Basics of Stats
24/47
Measures ofDispersion or Spread
(ungrouped)Standard Deviation
Lets return to our NIFTrandom sample
5000, 6000, 6000, 6000, 6000,8000, 11000, 11500, 12000,13000, 15000, 15000, 17000,30000, 110000
Follow the steps on the rightwhile we calculate the standarddeviation as a class on theboard
1. Calculate themeanwhich is 18100
2.
Find the distance thateach value has from themean
3. Square the distance
4. Add up these distancesand divide by thesample size 1
5. Then we get the squareroot of this number
8/4/2019 Basics of Stats
25/47
Standard DeviationX Mean (x-bar) X x-bar (X x-bar)2
5000 18100 -13100 17161 + E4
6000 18100 -12100 14641 + E4
6000 18100 -12100 14641 + E4
6000 18100 -12100 14641 + E4
6000
18100 -
12100
14641 + E48000 18100 -10100 10201 + E4
11000 18100 -7100 5041 + E4
11500 18100 -6600 4356 + E4
12000 18100 -6100 3721 + E4
13000 18100 -5100 2601 + E4
15000 18100 -3100 961 + E4
15000 18100 -3100 961 + E4
17000 18100 -1100 121 + E4
30000 18100 11900 14161 + E4
110000 18100 91900 844561 + E4
8/4/2019 Basics of Stats
26/47
Standard Deviation We sum (x x-bar)2, and get the square root
of this sum. This is the standard deviation.What is the square root of the sum?
Appx. 26,219
8/4/2019 Basics of Stats
27/47
The Subtle Difference
Between S and The difference in the divisors (N versus n-
1) results in S being slightly larger than
.
This is to account for the fact that S(from a sample) is an estimate of the
(of a population) and this adds a degreeof error to the value.
Note: for large n the difference is trivial.
8/4/2019 Basics of Stats
28/47
A Valuable Tool The standard deviation is a rather
recent invention and was originally
devised by Gauss to explain the errorobserved in measured star positions.
Today it is used in everything from
Quality Control to Measuring Risk infinancial investments.
8/4/2019 Basics of Stats
29/47
Measures ofCentral Tendency and Dispersion
(Grouped Data)
Remember that grouped data is a collectionof data that has been placed into categories
Thus we need to calculate the mean andstandard deviation differently, but the idea is
the same.
8/4/2019 Basics of Stats
30/47
A.M for Grouped DataThe following are thefrequency distribution
of 500 workersaccording to theirweekly income (in Rs.)Find the average
income.
Income Persons
0 50 90
50 100 150
100 150 100
150 200 80200 250 70
250 - 300 10
8/4/2019 Basics of Stats
31/47
A.M for Grouped Data
Income Persons Mid values Deviations
fx d
0 50 90 25 - 2 -180
50 100 150 75 - 1 -150
100 150 10 125 0 0
150 200 80 175 1 80200 250 70 225 2 140
250 - 300 10 275 3 30
Total 500 -80
8/4/2019 Basics of Stats
32/47
A.M for Grouped Data
117.50
500
80125
1
1
Rsx
hx
f
dXf
Ax n
i
i
n
i
ii
!
!
!
!
!
8/4/2019 Basics of Stats
33/47
Advantages /Disadvantages of theArithmetic Mean
Advantages:
1) Familiar and intuitively clear to mostpeople
2) Every data set has one and only one mean
3) Useful for performing statisticalprocedures
Disadvantages:
1) May be affected by extreme values
2) Tedious to compute
3) Difficult to compute for data set withopen- ended classes
8/4/2019 Basics of Stats
34/47
Computation ofMean, Median, and Mode for grouped Data
Age in
(Yrs)
id
Value(x)d=(X-A)/h
No. of
Pts(f).fxd
Cummulative
Frequency
10 - 20 15 -3 5 -15 520 - 30 25 -2 19 -38 24
30 - 40 35 -1 26 -26 50
40 - 50 45 0 35 0 85
50 - 60 55 1 15 15 100
60 - 70 65 2 3 6 103Total 103 -58
Arithmetic ean = 45 + (-58)/103 X10 = 39.4
8/4/2019 Basics of Stats
35/47
Computation ofMean, Median, and Mode for grouped Data
Median = LC F
FXh
( / . .)2
where L is lower limit of Median Class; N is total FrequencC.F. id cumulative frequency of class preceding median class, F is frequency of median clasand h is class width.N/2 = 103/2 = 51.5 This value lies in the class interval 40-50 (This value is seen from thecumulative frequency column). Hence L=40
Median = 40+ (51.2 -50)/ 35 x10 = 40.34
8/4/2019 Basics of Stats
36/47
8/4/2019 Basics of Stats
37/47
Comparing the Mean, Median,
and Mode
Mean Mode
Median
Mode Mean
Median
8/4/2019 Basics of Stats
38/47
Summary ofCentral Tendency Measures
Measure Equation Description
Mean 7x /n Balance Point
Median (n+1) th item in
array2
Middle value in
ordered array
Mode none Most frequent
8/4/2019 Basics of Stats
39/47
Standard Deviation (Grouped data)
hf
df
f
dfDS v
v
v
!
22
.
Where f is frequency; d is deviation computed
as
di=
x
h
i
8/4/2019 Basics of Stats
40/47
SD for Grouped DataThe following dataprovides the chest
measurement in Cms.Of 50 MBBS students.
Find Mean and SD
Chest
Measurement
(Cms)
No. of
Students
61 70 2
71 80 10
81 90 20
91 100 17
101 - 110 1
8/4/2019 Basics of Stats
41/47
S.D for Grouped DataC I Mid
Values(x)Fr.(f)
D= f x d f X d2
61 71
65.5 2 -2 -4 871 -80 75.5 10 -1 -10 10
81 90 85.5 20 0 0 0
91 100 95.5 17 1 17 17100 110 105.5 1 2 2 4
Total 5 39
h
ax
8/4/2019 Basics of Stats
42/47
S.D for Grouped Data
86.8
5.86
2
1
1
1
1
2
1
1
!
!
!!
!
!
!
!
!
!
hx
f
xdf
f
xdf
hx
f
xdf
Ax
n
i
i
n
iii
n
i
i
n
iii
n
i
i
n
i
ii
W
8/4/2019 Basics of Stats
43/47
Uses of Standard Deviation
Aside from measure of dispersion...
Determines where values offrequency distribution are in relationto mean (standard scores)
Measures percentage of items within
specific ranges Chebyshevs Theorem
Normal distribution
8/4/2019 Basics of Stats
44/47
Coefficient of Variation
1.Measure of relative dispersion
2.Always a % 3.Shows variation relative to mean
4.Used to compare 2 or more groups
Sample PopulationCV
s
x! (100) CV ! (100)Q
W_
8/4/2019 Basics of Stats
45/47
Coefficient of Variation
ExampleWhich technician shows more variability?
Qa!40
Wa!5Qb!160
Wb!15
8/4/2019 Basics of Stats
46/47
Q
Solution
CV ! (100)W
Technician B5
40(100)=
= 12.5%
Technician A
15
160(100)=
= 9.4%
8/4/2019 Basics of Stats
47/47
Summary ofVariation Measures
Measure Equation Description
Range xlargest - xsmallest Total Spread
Interquartile RangeQ
3- Q
1Spread ofMiddle 50%
Standard Deviation
(Sample)x
n
2
1
Dispersion aboutSample Mean
Standard Deviation
(Population)
x
N
Q2 Dispersion about
Population Mean
Variance
(Sample)7(x )
2
n 1
Squared Dispersionabout Sample Mean
Coeff. of Variation s / (100) Relative Variation
x_
x_
x_