of 18
8/2/2019 Statistical Methods [Jadhav]
1/18
-> government (laid off 125000)
Stock market goes up when employment goes down-
->Private (gained 67000)
V54000
Look behind the numbersThey take a sample and use it to exempl ify the population
Statistics
-entitlements
-services (highways, education etc)
-[discretionary ]-> defense $600 Bill
y= a+b*other exp
(Receipts + Deficits)= Govt. Exp.
Qualitative-
0 1 2 3 - 100
Discrete-
2.1
Continual-
Quantitative-
Data
Example: beer bottles- every thousand are tested to make sure all have 12 oz
Take every third or large section-
Systematic sampling
?
Every member of the universe has equal probability to get included into the sample-
Ex.
5'3''
5'4''
5'7''
5'8''5'9''
5'11''
6'0''
6'2''
Random sampling
Sample:
The average does not have to be a number in the sample
5'3''-> 5'7''5'8''->5'11''
6'-> up
The cross boundaries must me mutually exclusive.Frequency Midpoint Interval
3 5'5''
3 5'9''
2 6'1''
3x5'5''=16'5''16'4''
3x5'9''=17'9''
12'2''
49'4''/8=average height in the class
Class
interval
Scatter plot
Presentation:
5'3'' 6'2''
Using raw data
Histogram
5'3'' 5'8'' 6' 5'5'' 5'9'' 6'1''
average
Frequency
Ogive
6"
Morethan
Less than
5'3''-5'7''
5'8''-5'11''
Skewed
Positive skew
0 x
Negative skew
Statistical MethodsFriday, September 03, 2010
11:00 AM
Statistical Methods Page 1
8/2/2019 Statistical Methods [Jadhav]
2/18
Presentation is best when there is a story that is si mple and tells exactly whats
going on
Slope:
x
y
y= a- bx Arithmetic mean is in some cases higher
than the geometric mean.
Geometric mean =3 x1*x2*x3
T f
2009 120
2008 110
i.
Time series (time on x axis data on y axis)1)
n=20 fXm=490
20.55-20.5
Group with highest number of frequencies= mode
Class interval Frequency (f)
5.5-10.5 1
10.55-15.5 2
15.55-20.5 3
20.55-25.5 5
25.55-30.5 4
30.55-35.5 3
35.55-40.5 2
Xm f*Xm
8 8
13 26
18 54
23 115
28 112
33 99
38 76
2)
f=mode, xm= midpoint
Two types of DATA:
States in USA-Alabama
Alaska
Annual equilibrium
fXm/ n
=490/20
=24.5
How far is each value from the mean
Its best to chose the distribution with the
shorter range
Because data is more uniform
Range
Dispersion
5.5 24.5/x 40.4
Ix-/xI
s= (x-/x) / n
= (x-) /N
=
STD Deviations= s
Absolute value (every sign is positive)
-3.5 -2.5 -5 5 25 35
(x-/x)/ s
68%
95%
99.7%
Population mean ismore confident the
further it is
Variance =s
=(fXm-[(fXm)/n])/ ni1
Statistical Methods Page 3
8/2/2019 Statistical Methods [Jadhav]
4/18
= [13310- (490)/20]/19
68.7
How many distributi ons lie beneath
What is the average (the mean)
How far are the values spread from it
Standard deviation
Ch1-4
The chance that somethi ng is going to happen
Probability
Experiment: Results in an outcome
Statistical Methods Page 4
8/2/2019 Statistical Methods [Jadhav]
5/18
Sample Space
Coin H/T
1 2 3
4 5 6
DIE 1 -> 6
All outcomes = Event
2 1/6
Even= Odd = 3/6 = 0.5
Venn Diagram: # UD students
100/ 3000 -> in sample
Conditional probabilityRed Blue
Probability with
replacement
20 30
When a favorable outcome
P= # of favorable outcomes
Total # of Possibilities
= 4/52 (to get an ace)
P(E)= 0.08
P(/E)= 0.92 (the alternative) P(E) + P(/E)=1
0_
8/2/2019 Statistical Methods [Jadhav]
6/18
P(A or B)= P(A) + P(B)
P(A)+P(A)=1
n=50
Blood Types F
A 22
B 5
O 2AB 21
Probability of A or B
22 f
50 n
28
50 = A
Not mutually exclusive
A
B
P(A or B)= P(A) + P(B)- P(A and B)
A and B
Probability of selecting A doesnt effect the probability of selecting B
K Q
V V
4/52 4/52
Independent
Not Independent
K Q
4/52 4/51
P(A) AND P(B)= P(A) * P(B I A )
P(A) and P(B)= P(A) * P(B)
(K+Q)= (4/52) * (4/52)
P(AandB) = P(A) * P(BIA)p(A) P(A)
ABC
ACB
BAC
BCA
nPr= n!(n-r)!
(3-3)!= 0!
3x2x1
1 =6
Combinations
nCr= n!(n-r)! * r!
30! 30x29x28
(27)!* 3! = 3x2x1
Is it mutually exclusive
See if the are independent
Probability
Mean
Standard deviation
Standardize the distribution
Raw Data:
0
25
s-5
30
1s
20
-1s
10-20 5 15
20.5-30 10
Xm
12 13 15 17 20
Midpoint is the average of the values
f*Xm
Cv=sx
Statistical Methods Page 6
8/2/2019 Statistical Methods [Jadhav]
7/18
2x2x2=8
h
th
T h
t
h
t
h
t
h
t
h
t
Hhh
Hht
Hth
Htt
Thh
Tht
Tth
ttt
P()
1/8 or .125
Hhh hht hth thh htt tht tth ttt
1/8=0.125
3/8=.275 3/8=0.3751/8=0.125
3 2 1 o
h h h h
1 2 3
t t t
M= (x*P(x))
X P(x) x*p(x)
3 .125 .375
2 .375 .75
1 .375 .375
0 .125 0
1.5=
= (x-) *P(x)
(x-m) (x-) = (x-)
*P(x)
1.5 2.25 0.28
0.5 0.25 .09
-0.5 0.25 .09
.05 2.25 0.28
= 0.74
= 0.86
Binomial
Limited # of Trials= n1.
Only 2 possibilities2.
P(success) P(failure)3.
p + q =1
P(x)= nCr * p^x * q^n-x
= 3C2 * 0.5^2 * 0.5^1=3!
1!=3 * 0.25 * 0.5=.375
Three tosses= three trials
nCr= n!(n-r)!r!
Binomial distribution
p.760 apendix b9
Once you know n,p,q-
=n*p
=n*p*q
=X*P(x)
=[(x-)*P(x)]
=*(x)+P(x)-
Simple formula to find the variance
# on Balls 0 1 2 3 4
P(x) 1/5 .2 .2 .2 .2
= 0+0.2+0.4+0.6+0.8=2
=1+4+9+16=(30*2)=6-4=2 = 2=1.4
=x
n
=mean
2
=1.4
15
=2
-36 -26 -16 +16 +26 +36
.136 .34 .34 .136
.023 .023
Statistical Methods Page 7
8/2/2019 Statistical Methods [Jadhav]
8/18
STD Normal= x-
2-2
1.4
Z=x-
Mt Z*=x-
Area under the Normal Curve:
The closer we are to the mean the more accurate we are
If the sample size is 30 or more than the number of samples gets the same result
z= Xi-
Xi=+*z
50 +1 100
=15
0
.3413 .3413
=50-15*1
=.35
p.750
.04 column
Z column= 1
z=1.04
0 100
80
=14
Xi----> 80%
0
.3413
.3413
.30%
=80-14*.84
=8-11.76
=68.24
z= -3 -2 -1 0 1 2 3
38 52 66 80 94 108 122
X=+*z
Area under the graph
Statistical Methods Page 8
8/2/2019 Statistical Methods [Jadhav]
9/18
0 1.8
42%
z=x-
Find the area between z=0 and z=1.8
z=-2.48 + z=-0.83
-2.48 -.83 0
I
-.4934
+.2967
.1967
=19.67%
-.4934+.2967=-0.1967
Its not possible sometimes , so you take a representative area and come to a conclusion of the
population.
-x= Bias, sampling error
x
Why a sample other than entire population?
x3 x1 x2 x4
x
The mean of the means of sample is
always uniform, you always end up with
normal distribution. The result is better
the higher number of samples
# samples
(1)pop
(2) Bias V
How do we find the area under this?!
Area under curve is at least equal to:
=1- (1/k)
2=1-(1/4)= 3/4=0.75 =75%
Chebysheu:k>1
z= Xi-
/n
Statistical Methods Page 9
8/2/2019 Statistical Methods [Jadhav]
10/18
Stratified sampling:
100 150 250 600
n=n1 + n2 + n3
X is estimator of
Unbiased1)
Consistent - as n^ Bias v2)
Efficient - smallest "s" ->3)
X
V
.95
z=2
Confidence Level @95%
z=x-
z*=x
z=Xi-
/n
z*(/n)=Xi--2.6 +2.6
interval
= 1 conf. leve l
= 1-.95=0.05
X-Z(/2)* (/n)
8/2/2019 Statistical Methods [Jadhav]
11/18
pop norma y
not known
N30
Z
t
Df
11 11
13 14
50 50
x=10
-
As df t distribution approaches Z
distribution-obviously because you are
approaching 30
UD students drink beer
Copernicus heliocentric model
Could be non numerical-
Hypothesis: some statement about some population parameterHypothesis Testing
350
=15
Ho:=k (350)
Null Hypothesis:
H1=k
Alternate Hypothesis:
Confidence level-
-
Level of significance
If the value is >x> then it is rejected
Test Value= X -> z
-1.96
v
321
vz=critical value
=0.025
=0.025
Rejection zone
+1.96
v
371
Do not
reject
0
350
325
Left Tailed
Ho:>K H1: n=35
x=25,226
=3,251
=0.01
Ho:=24672 H1:24672
Is the number significantly different ?
Cv= 2.58
TV=x-
/n
Statistical Methods Page 11
8/2/2019 Statistical Methods [Jadhav]
12/18
CV CV
-2.58 +2.58
=25226-24672
3251/35
= 1.01
n
1.01
-2.262 +2.262
The average starting salary for a nurse i s $2400
=$24,000
n=10
x=23,450
s=400
=0.05
Ho:=24000 H1:24000
CV= t=
Df=(n-1)= 9
TV= 23450-24000
400/10 = -4.35
.5- .025=0.475-4.35
Rejected!
n=30
x=43,260
=5,230
= 0.05
Ho:42,000
z=+1.65 =CV
TV= 43260-42000
5230/30 =1.32
The average salary of an assistant professor > $42,000
CV
1.65
1.32
0
Rejection zone
II
Ho false
Do not
rejecterror
-One sided example
The average price of shoe80 H1: 0.10
n=28 , Ho: m23, =.05, df=27, CVt=1.703, TV=4.5
29 24 24 .05 28 1.701 1.88
27 25 25 .1 26 1.315 1.84
Reject Ho
Quiz answers
Statistical Methods Page 12
8/2/2019 Statistical Methods [Jadhav]
13/18
TV, 1.55
CV
2.33
P= P(TV> Pi)
When is Ho true?
z= 1.55 = -0.4392
p .0608
= 0.05
>P I Ho is true - Reject
P> I Ho is true - Do Not Reject
Qd=a-b*Price
Y=a+b*x
Infl=a+B*M1
Housing starts= a-b* Mortgage Rate
Y X
Y1 X1 Stationary(within 1 year timeframe)
Y2 X2
The Project:
Minimum
Pairs
Select any two series where the independent variable impacts the
dependent variable
Burro of labor statistics
Cars sold 1 yr
15k 16k 17k 18k 19k 20k Price of car
DONT DO A TIME SERIES
HupoData
Mortgage rates effect housing start
Source:
Simple linear regression in excel
Find two variables: cause and effect- number of classes missed and grade achieved
Interest rate goes up borrowing goes down
For one specific year
One line saying what im trying to relate
Give source of data Appendix A
Testing how close our sample mean is to population mean
Compare two sample means: the sample means are independent of each other, populations are normally distributed
Your IQ before stat class and after
Ho:1=2 or 1-2=
(x1-x2)=2 1 + 2 2
n1 n2
H1: 12 or 1-2 z= (observed value- expected value)
2 1 + 2 2
n1 n2
0
The average price of a hotel room in
Dall as= $88.42, n1=50 s1=$5.62
Denton= $80.61 n2=50 s2= $4.38
=0.05
Cv cv
-1.96 +1.96 7.45
tv
Rejection zone
Statistical Methods Page 13
8/2/2019 Statistical Methods [Jadhav]
14/18
Test Value= (88.42-80.61) = 7.45
5.62 + 4.83
50 50
# of sports for boys=8.6 n1=50 S1=3.3
# of sports for girl s =7.9 n2=50 S2=3.3
Ho:12
=0.10
z=8.6-7.9
(3.3/50)+(3.3/50)
=1.06=0.3554
P=.5-.3554=.1446
In p value you always
contest the aternate
hypothesis
Find the P value
.3554
(x1-x2)-z/2 Sp= (n-1)s1 + (n2-1)S21)
n1+n2-2
t= .(x1-x2)2)
Sp(1/n1+1/n2)
Df=1 Df=9
Df=49
Df=49
Find Right->=0.1/2= 0.5 -> 36.415
Left -> 0.95 ---->13.848
=0.1 Conf=0.9
n=25 df=24
STD DEV
(n-1)s
8/2/2019 Statistical Methods [Jadhav]
15/18
z=X-
/n
X-Z/2*/n2H1:1
8/2/2019 Statistical Methods [Jadhav]
16/18
0 71
95% confidence interval
Standard deviation= 1.6 mgs
=0.05
n=19
(n-1)*s
18*1.6=46.08
46.08 2 1
8/2/2019 Statistical Methods [Jadhav]
17/18
1 1
Ho:1=2
H1:12
non-smokers n2=18 s2=10
1 CV
=0.1
/2=0.05
Test value-
F=36/10=3.6
>24 (25)
V
17
2.19 3.6
Not Reject
2.19
Project: plug data into excel
then select the scatter
function
Higher divergence in church 1 vs church 2
Ho:1 1
Always look at the right tailed test
Null must be less/more than orequal to
Variation of joggers in US vs africa
Whether = or
Level of significance/2
If hypothesis is correct use the right table
-Linear relationship
Correlation
Perfectly elastic claim
Completely inelastic
-1 0 1
r-sample -correlation
We want negative correlation or positive correlation:
r= (x-x)*(y-y) = n(x*y))-(x)*(y)
(n-1)*Sx*Sy [n(x2)-(x)2]*[n(y2)-(y)2]
-1 0 1
H0:=0
H1:0
tv= r*n-2 = r* n-2
1-r2 1-r2
Number of absence (X) Grade (Y)
6 82
2 86
15 43
9 74
X*Y X2 Y2
492 36 6724
172 4 7396
645 225 1849
666 81 5476
100
75
50
Statistical Methods Page 17
8/2/2019 Statistical Methods [Jadhav]
18/18
12 58
5 90
8 78
X=57 Y=511
696 144 3364
450 25 8100
624 64 6084
=3745 =579 =38993
25
0 5 10 15
r=7*3745 - 57*511
[7*579-(57)2]*[7*38993-(511)2]
=-0.944
=0.1
Cv=2.015
tv=.944 * [5/(1-.9442)
=-6.36