Post on 04-Apr-2018
transcript
7/29/2019 Module 1 Statistical Inference
1/67
Statistical
InferenceDr. Basheer Ahmad Samim
18:16 PM
7/29/2019 Module 1 Statistical Inference
2/67
Course Outline1. Review of Descriptive Statistics and SPSS
2. Random Variable and Mathematical Expectation
3. Discrete Probability Distributions (Binomial, Poisson)
4. Continuous Probability Distribution (Normal)
5. Sampling Theory
6. Confidance Intervals
7. Hypotheses Testing
8. Goodness of Fit
9. Regression and Correlation with ANOVA
10. Multiple Regression
11. All the topics will be SPSS oriented
28:16 PM
7/29/2019 Module 1 Statistical Inference
3/67
Recommended Readings (Books)
Introduction to Statistics,Walpole, R. E., 3rd Edition
(2000)Statistical Methods for Practice
and Research by Ajai S. Gaurand Sanjaya S. Gaur
38:16 PM
7/29/2019 Module 1 Statistical Inference
4/67
Attendance Policy16-Weeks Teaching16-Lectures (32-Attendance)
Twice Roll Call, Once before the breakand once after the break
At Least 80% (24) Attendance is
compulsory to be elligible for the FinalExamination
No Roll Call after First Ten(5) minutes
48:16 PM
7/29/2019 Module 1 Statistical Inference
5/67
Mode of TeachingLecture
SPSS Workshop
Discussion Session
58:16 PM
7/29/2019 Module 1 Statistical Inference
6/67
Mode of AssessmentQuizes (15%)
Assignments (15%)Class Performance (5%)
Mid Term Test (25%)Final Examination (40%)
68:16 PM
7/29/2019 Module 1 Statistical Inference
7/67
Questionnaire
78:16 PM
7/29/2019 Module 1 Statistical Inference
8/67
VariableA characteristic orproperty thatvaries
from individual toindividual.
88:16 PM
7/29/2019 Module 1 Statistical Inference
9/67
ConstantA characteristic orproperty that does notchange from individual
to individual.
98:16 PM
7/29/2019 Module 1 Statistical Inference
10/67
Types of Variables
Types ofVariables
Qualitative Quantitative
Discrete Continuous
108:16 PM
7/29/2019 Module 1 Statistical Inference
11/67
Nominal ScaleVariable categories are mutually
exclusive and exhaustive.Variable categories have no
logical order.
Eye Color, Hair Color, Gender.
118:16 PM
7/29/2019 Module 1 Statistical Inference
12/67
Ordinal ScaleData categories are mutually
exclusive and exhaustive.Data classifications are ranked orordered according to the
particular trait they possess.Level of Knowledge about SPSS
128:16 PM
7/29/2019 Module 1 Statistical Inference
13/67
Interval ScaleData categories are mutually exclusiveand exhaustive.
Data classifications are ranked or orderedaccording to the particular trait theypossess.
Equal differences in the characteristic arenot represented by equal differences inthe measurements.Temperature, Shoe Size and IQ scores
138:16 PM
7/29/2019 Module 1 Statistical Inference
14/67
14
Ratio ScaleData categories are mutually exclusive and
exhaustive.Data classifications are ranked or ordered
according to the particular trait they possess. Equal differences in the characteristic are
represented by equal differences in the
measurements. The zero point is the essence of the
characteristic.Height, Weight, Distance.
8:16 PM
7/29/2019 Module 1 Statistical Inference
15/67
15
Scale
Nominal
Data may only
be classified
Eye color,Hair Color
Gender.
Ordinal
Data are
ranked
Level ofKnowledge
aboutSPSS
Interval
True Zero Point
does notExist.
Temperature,Shoe Size,IQ Scores
Ratio
Meaningful Zero
point and RatioBetween values
Height, Weight,Distance.
Measurement Scales
8:16 PM
7/29/2019 Module 1 Statistical Inference
16/67
16
Data
The information collectedfor any kind of investigation.Usually Numerical but can
be Qualitative.
8:16 PM
7/29/2019 Module 1 Statistical Inference
17/67
17
Primary DataThe initial material collected
during the research process.The information collected
directly from the respondent.Personal Invetigation, Through Investigator, Through Questionnaire,Through Local Sources, Through Telephone,
8:16 PM
7/29/2019 Module 1 Statistical Inference
18/67
18
Secondary DataThe information
collected and processedby the people other than
the researcherGovernment Organizations, Semi-GovernmentOrganizations,
8:16 PM
7/29/2019 Module 1 Statistical Inference
19/67
Data Collection
Any of the following methods may beadopted:
(a) Personal interview(b) Direct observation
(c) Mail interview (internet interview)
(d) Telephone interview
What are the cons and pros of each?
198:16 PM
7/29/2019 Module 1 Statistical Inference
20/67
Data management
Office Editing,
Post Coding,
Data entry and Verification.
208:16 PM
7/29/2019 Module 1 Statistical Inference
21/67
Data organization and Analysis
Preparing data for analysis, Extracting descriptive measures
from the data, Using advanced statistical
techniques to analyze the dataand draw inference there from.
218:16 PM
7/29/2019 Module 1 Statistical Inference
22/67
22
Measures of Central Tendency
Arithmetic Mean
Quantiles(Median, Quartiles, Deciles, Percentiles)
Mode
8:16 PM
7/29/2019 Module 1 Statistical Inference
23/67
23
ArithmeticMean
A value obtained by dividing the sum of all the observations by
their number.
nn
XXXX
n
1ii
n21X
If X1, X2, , Xn are n observations of a variable X then
nsobservatiotheofNumbernsobservatiotheallofSumMeanArithmetic
8:16 PM
7/29/2019 Module 1 Statistical Inference
24/67
24
Arithmetic Mean
The marks obtained by 8 students are:
Marks5.688
548
8
637267
X
67 72 68 70 65 68 75 63
8:16 PM
7/29/2019 Module 1 Statistical Inference
25/67
25
QuantilesFor individual observations/discrete frequencydistribution, the ith quartile, jth decile and kth
percentile are located in the array/discrete frequencydistribution by the following relations
32,1,ion,distributiin thenobservatioth4
1)i(nQi
,92,1,jon,distributiin thenobservatioth10
1)j(nDj
,992,1,kon,distributiin thenobservatioth100
1)k(nPk
8:16 PM
7/29/2019 Module 1 Statistical Inference
26/67
26
The weekly TV Watching times (Hours):
25 41 27 32 43 66 35 31 15 5
34 26 32 38 16 30 38 30 20 21
Quartiles
The array of the above data is given below:
5 15 16 20 21 25 26 27 30 3031 32 32 34 35 37 38 41 43 66
8:16 PM
7/29/2019 Module 1 Statistical Inference
27/67
27
Quartiles
Hours22.021}-0.25{2521
obs.}5th-obs.0.25{6thobs.th5
ondistributiin thenobservatioth25.5
ondistributiin thenobservatioth
4
1)1(20Q1
8:16 PM
7/29/2019 Module 1 Statistical Inference
28/67
28
Hours30.530}-0.50{3130
obs.}10th-obs.0.50{11thobs.th10
ondistributiin thenobservatioth50.10
ondistributiin thenobservatioth
4
1)2(20Q2
Quartiles
8:16 PM
7/29/2019 Module 1 Statistical Inference
29/67
29
Quantiles
8:16 PM
7/29/2019 Module 1 Statistical Inference
30/67
30
ModeThe mode is a value which occurs
most frequently in a set of data. Ormode is a value that occurs
maximum number of times in a
sequence of observations.
8:16 PM
7/29/2019 Module 1 Statistical Inference
31/67
31
The total automobile sales (in millions) in
the United States for the last 14 years.
9.0 8.2 8.0 9.1 10.3 11.0 11.5
10.3 10.5 9.8 9.3 8.2 8.2 8.5
Mode
Mode = 8.2 million
8:16 PM
7/29/2019 Module 1 Statistical Inference
32/67
32
Measures of variation measure thevariation present among the values
of a data set, so measures ofvariation are measures of spread of
values in the data.
8:16 PM
7/29/2019 Module 1 Statistical Inference
33/67
33
Absolute Measures of
Dispersion
RangeQuartile Deviation
Mean (Average) Deviation
Variance and Standard Deviation
8:16 PM
7/29/2019 Module 1 Statistical Inference
34/67
34
Relative Measures ofDispersion
Coefficient of RangeCoefficient of Quartile Deviation
Coefficient of Mean Deviation
Coefficient of Variation (CV)
8:16 PM
7/29/2019 Module 1 Statistical Inference
35/67
35
RangeDifference between the largest
and the smallest observations
Largest SmallestRange X X
8:16 PM
7/29/2019 Module 1 Statistical Inference
36/67
36
Ignores the way in which data are distributed
Sensitive to outliers
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Disadvantages of the Range
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
8:16 PM
7/29/2019 Module 1 Statistical Inference
37/67
Inter-quartile Range (IQR)
Inter-quartile range = 3rd quartile 1st QuartileQ3 - Q1
IQR is independent of outliers
378:16 PM
7/29/2019 Module 1 Statistical Inference
38/67
Inter-quartile Range
38
Median
(Q2)
XmaximumXminimum Q1 Q3
25% 25% 25% 25%
12 30 45 57 70
Inter-quartile Range (IQR)
= 57 30 = 27
8:16 PM
7/29/2019 Module 1 Statistical Inference
39/67
39
The Mean (absolute) Deviation
X
8 3
5 0
2 -3
0
Mean Deviation is the average of absolutedeviations taken form the mean value.
( ) 62
3
x x
n
3
0
3
6
( )X X X X
8:16 PM
7/29/2019 Module 1 Statistical Inference
40/67
40
Variance
Variance is the averageof the squared
deviations taken fromthe mean value.
X cm (X-Mean)^2 X2
4 36 16
6 16 369 1 81
12 4 144
13 9 169
16 36 25660 102 702
2
2 2
2
222 2
( ) 102( ) 17
6
702 102( ) 17
6 6
x xi S cm
n
X Xii S cm
n n
8:16 PM
C i St d d D i ti
7/29/2019 Module 1 Statistical Inference
41/67
41
Comparing Standard Deviations
Mean = 15.5S = 3.33811 12 13 14 15 16 17 18 19 20 21
Data A
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
S = 4.567
Data C
The smaller the standard deviation, the more tightlyclustered the scores around mean
The larger the standard deviation, the more spread outthe scores from mean8:16 PM
11 12 13 14 15 16 17 18 19 20 21
Data BMean = 15.5
S = 0.926
7/29/2019 Module 1 Statistical Inference
42/67
42
Relative Measures of Variation
Largest Smallest
Largest Smallest
Coefficient of RangeX X
X X
3 1
3 1
Coefficient of Quartile DeviationQ Q
Q Q
Coefficient of Mean Deviation MDMean
8:16 PM
7/29/2019 Module 1 Statistical Inference
43/67
Coefficient of Variation (CV)
Can be used to compare two or moresets of data measured in differentunits or same units but different
average size.
8:16 PM 43
100%X
SCV
7/29/2019 Module 1 Statistical Inference
44/67
44
Use of Coefficient of Variation Stock A:
Average price last year = $50 Standard deviation = $5
Stock B:
Average price last year = $100
Standard deviation = $5
but stock B is
less variablerelative to its
price
10%100%$50
$5
100%X
S
CVA
5%100%$100
$5100%
X
SCVB
Both stocks
have the
same
standard
deviation
8:16 PM
7/29/2019 Module 1 Statistical Inference
45/67
45
Appropriate Choice of Measure
of Variability
If data are symmetric, with no serious
outliers, use range and standarddeviation.
If data are skewed, and/or have serious
outliers, use IQR. If comparing variation across two data
sets, use coefficient of variation (C.V)
8:16 PM
7/29/2019 Module 1 Statistical Inference
46/67
46
Five Number SummaryThe five number summary of a data set consists of the
minimum value, the first quartile, the second quartile, the
third quartile and the maximum value written in that order:Min, Q1, Q2, Q3, Max.
From the three quartiles we can obtain a measure of central
tendency (the median, Q2
)and measures of variation of thetwo middle quarters of the distribution, Q2-Q1 for the
second quarter and Q3-Q2for the third quarter.
8:16 PM
7/29/2019 Module 1 Statistical Inference
47/67
47
The weekly TV viewing times (in hours).
25 41 27 32 43 66 35 31 15 5
34 26 32 38 16 30 38 30 20 21
The array of the above data is given below:
5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 37 38 41 43 66
Five Number Summary
8:16 PM
7/29/2019 Module 1 Statistical Inference
48/67
48
Hrs22.021}-0.25{2521obs.}5th-obs.0.25{6thobs.5th;Q1ofVALUE
obs.5.25thdatain theobs.th4
1)1(20;Q1ofLOCATION
Five Number Summary
Hrs30.530}-0.50{3103obs.}10th-obs.0.50{11thobs.th10;Q2ofVALUE
obs.th50.10datain theobs.th4
1)2(20;2QofLOCATION
Minimum value=5.0 Maximum value=66.0
Hrs36.535}-0.75{3735obs}15th-obs{16th75.0obs15th;3QofVALUE
obs.15.75thdatain theobs.th
4
1)3(20;3QofLOCATION
8:16 PM
7/29/2019 Module 1 Statistical Inference
49/67
49
Box and Whisker DiagramA box and whisker diagram or box-plot is a
graphical mean for displaying the five number
summary of a set of data. In a box-plot the firstquartile is placed at the lower hinge and the
third quartile is placed at the upper hinge. The
median is placed in between these two hinges.
The two lines emanating from the box are
called whiskers. The box and whisker diagram
was introduced by Professor Jhon W. Tukey.
8:16 PM
7/29/2019 Module 1 Statistical Inference
50/67
50
Construction of Box-Plot
1. Start the box from Q1 and end atQ3
2. Within the box draw a line torepresent Q2
3. Draw lower whisker to Min.Value up to Q1
4. Draw upper Whisker from Q3 upto Max. Value
Q1
Q3
Q2
8:16 PM
MaxValue
MinValue
7/29/2019 Module 1 Statistical Inference
51/67
51
Construction of Box-Plot
1. Q1=22.0 Q3=36.5
2. Q2=30.53. Minimum Value=5.0
4. Maximum Value=66.0
70
60
50
40
30
20
10
0
8:16 PM
7/29/2019 Module 1 Statistical Inference
52/67
52
Interpretation of Box-Plot
70
60
50
40
30
20
10
0
Box-Whisker Plot is useful to identify
Maximum and Minimum Values in the data
Median of the data
IQR=Q3-Q1,Lengthy box indicates more variability in the data
Shape of the data From Position of line within box
Line At the center of the box----Symmetrical
Line above center of the box----Negatively skewed
Line below center of the box----Positively Skewed
Detection of Outliers in the data
8:16 PM
7/29/2019 Module 1 Statistical Inference
53/67
53
OutliersAn outlier is the values that falls well outside the overall
pattern of the data. It might be
the result of a measurement or recording error,
a member from a different population,
simply an unusual extreme value.
An extreme value needs not to be an outliers; it might,
instead, be an indication of skewness.
8:16 PM
7/29/2019 Module 1 Statistical Inference
54/67
54
Inner and Outer Fences
If Q1=22.0 Q2=30.5 Q3=36.5
25.58IQR1.5QFenceInnerUpper
25.0IQR1.5QFenceInnerLower:FencesInner
3
1
0.80IQR3QFenceOuterUpper
5.21IQR3QFenceOuterLower:FencesOuter
3
1
8:16 PM
7/29/2019 Module 1 Statistical Inference
55/67
55
Identification of the Outliers
1. The values that lie within inner
fences are normal values
2. The values that lie outside inner
fences but inside outer fencesare possible/suspected/mild
outliers
3. The values that lie outside outer
fences are sure outliers
80
70
60
50
40
30
20
10
0
Plot each suspected outliers with an asteriskand each sure outliers with an hollow dot.
*
Only
66 is amildoutlier
8:16 PM
7/29/2019 Module 1 Statistical Inference
56/67
56
Box plots are
especially suitable for
comparing two or moredata sets. In such a
situation the box plots
are constructed on the
same scale.
Uses of Box and Whisker Diagram
Male Female8:16 PM
7/29/2019 Module 1 Statistical Inference
57/67
Standardized VariableA variable that has mean 0 and Variance 1 is
called standardized variable
Values of standardized variable are calledstandard scores
Values of standard variable i.e standard scores areunit-less
Construction
VariableofDeviationStandard
VariableofMeanVariableZ
8:16 PM 57
7/29/2019 Module 1 Statistical Inference
58/67
X Z
3 25 -1.3624 1.8561
6 4 -0.5450 0.2970
11 9 0.81741 0.6682
12 16 1.0899 1.1879
32 54 0 4.009
5.134
54
84
32
2
xS
n
X
X
2
)( XX
67.3
8
X
Sx
XX
Z
14009.4
0
2
zS
n
ZZ
2)( ZZ
Variable Z has mean 0 and
variance 1 so Z is a standard variable.
Standard Score at X=11 is 8174.067.3
811
Sx
XXZ
8:16 PM
Standardized Variable
7/29/2019 Module 1 Statistical Inference
59/67
59
The industry in which sales rep Mr. Atif works has meanannual sales=$2,500
standard deviation=$500.
The industry in which sales rep Mr. Asad works has meanannual sales=$4,800
standard deviation=$600.
Last year Mr. Atif s sales were $4,000 andMr. Asads sales were $6,000.
Performance evaluation by z-scores
Which of the representatives would you hireif you have one sales position to fill?
8:16 PM
7/29/2019 Module 1 Statistical Inference
60/67
60
Performance evaluation by z-scores
3500
500,2000,4
B
B
BB
B
Z
S
XXZ
Sales rep. Atif
XB= $2,500
SB= $500
XB= $4,000
Sales rep. Asad
XP=$4,800
SP= $600
XP= $6,000
2600
800,4000,6
P
P
PPP
Z
S
XXZ
Mr. Atif is the best choice8:16 PM
7/29/2019 Module 1 Statistical Inference
61/67
61
valuesof68%aboutcontains1SX
The Empirical Rule
X
68%
1SX
valuesof99.7%aboutcontains3SX
valuesof95%aboutcontains2SX 95%
X 2S
X 3S
99.7%
8:16 PM
7/29/2019 Module 1 Statistical Inference
62/67
62
A distribution in which the values equidistant from
the centre have equal frequencies is defined to be
symmetrical and any departure from symmetry is
called skewness.
1. Length of Right Tail = Length of Left
Tail
2. Mean = Median = Mode
3. Sk=0a) Sk=(Mean-Mode)/SD
b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)
8:16 PM
Measures of Skewness
7/29/2019 Module 1 Statistical Inference
63/67
63
A distribution is positively skewed, if the observationstend to concentrate more at the lower end of the possiblevalues of the variable than the upper end. A positivelyskewed frequency curve has a longer tail on the righthand side
1. Length of Right Tail > Length of Left
Tail
2. Mean > Median > Mode
3. SK>0
MeasuresofSkewness
8:16 PM
7/29/2019 Module 1 Statistical Inference
64/67
64
A distribution is negatively skewed, if the
observations tend to concentrate more at the upper
end of the possible values of the variable than the
lower end. A negatively skewed frequency curve has a
longer tail on the left side.
1. Length of Right Tail < Length of Left
Tail
2. Mean < Median < Mode
3. SK< 0
8:16 PM
Measures of Skewness
7/29/2019 Module 1 Statistical Inference
65/67
8:16 PM 65
The Kurtosis is the degree of peakedness or flatness of a
unimodal (single humped) distribution,
When the values of a variable are highly concentrated around
the mode, the peak of the curve becomes relatively high; the
curve isLeptokurtic. When the values of a variable have low concentration around
the mode, the peak of the curve becomes relatively flat;curve
isPlatykurtic. A curve, which is neither very peaked nor very flat-toped, it
is taken as a basis for comparison, is called
Mesokurtic/Normal.
Measures of Kurtosis
7/29/2019 Module 1 Statistical Inference
66/67
668:16 PM
Measures of Kurtosis
7/29/2019 Module 1 Statistical Inference
67/67
Measures of Kurtosis
1. If Coefficient of Kurtosis > 3 -----------------Leptokurtic.
2. If Coefficient of Kurtosis = 3 -----------------Mesokurtic.
3. If Coefficient of Kurtosis < 3 ----------------- is Platykurtic.
4
22
n X-XCoefficient of Kurtosis=
X-X