Statistical Methods [Jadhav]

8/2/2019 Statistical Methods [Jadhav]

1/18

-> government (laid off 125000)

Stock market goes up when employment goes down-

->Private (gained 67000)

V54000

Look behind the numbersThey take a sample and use it to exempl ify the population

Statistics

-entitlements

-services (highways, education etc)

-[discretionary ]-> defense $600 Bill

y= a+b*other exp

(Receipts + Deficits)= Govt. Exp.

Qualitative-

0 1 2 3 - 100

Discrete-

2.1

Continual-

Quantitative-

Data

Example: beer bottles- every thousand are tested to make sure all have 12 oz

Take every third or large section-

Systematic sampling

?

Every member of the universe has equal probability to get included into the sample-

Ex.

5'3''

5'4''

5'7''

5'8''5'9''

5'11''

6'0''

6'2''

Random sampling

Sample:

The average does not have to be a number in the sample

5'3''-> 5'7''5'8''->5'11''

6'-> up

The cross boundaries must me mutually exclusive.Frequency Midpoint Interval

3 5'5''

3 5'9''

2 6'1''

3x5'5''=16'5''16'4''

3x5'9''=17'9''

12'2''

49'4''/8=average height in the class

Class

interval

Scatter plot

Presentation:

5'3'' 6'2''

Using raw data

Histogram

5'3'' 5'8'' 6' 5'5'' 5'9'' 6'1''

average

Frequency

Ogive

6"

Morethan

Less than

5'3''-5'7''

5'8''-5'11''

Skewed

Positive skew

0 x

Negative skew

Statistical MethodsFriday, September 03, 2010

11:00 AM

Statistical Methods Page 1


2/18

Presentation is best when there is a story that is si mple and tells exactly whats

going on

Slope:

x

y

y= a- bx Arithmetic mean is in some cases higher

than the geometric mean.

Geometric mean =3 x1*x2*x3

T f

2009 120

2008 110

i.

Time series (time on x axis data on y axis)1)

n=20 fXm=490

20.55-20.5

Group with highest number of frequencies= mode

Class interval Frequency (f)

5.5-10.5 1

10.55-15.5 2

15.55-20.5 3

20.55-25.5 5

25.55-30.5 4

30.55-35.5 3

35.55-40.5 2

Xm f*Xm

8 8

13 26

18 54

23 115

28 112

33 99

38 76

2)

f=mode, xm= midpoint

Two types of DATA:

States in USA-Alabama

Alaska

Annual equilibrium

fXm/ n

=490/20

=24.5

How far is each value from the mean

Its best to chose the distribution with the

shorter range

Because data is more uniform

Range

Dispersion

5.5 24.5/x 40.4

Ix-/xI

s= (x-/x) / n

= (x-) /N

=

STD Deviations= s

Absolute value (every sign is positive)

-3.5 -2.5 -5 5 25 35

(x-/x)/ s

68%

95%

99.7%

Population mean ismore confident the

further it is

Variance =s

=(fXm-[(fXm)/n])/ ni1



4/18

= [13310- (490)/20]/19

68.7

How many distributi ons lie beneath

What is the average (the mean)

How far are the values spread from it

Standard deviation

Ch1-4

The chance that somethi ng is going to happen

Probability

Experiment: Results in an outcome



5/18

Sample Space

Coin H/T

1 2 3

4 5 6

DIE 1 -> 6

All outcomes = Event

2 1/6

Even= Odd = 3/6 = 0.5

Venn Diagram: # UD students

100/ 3000 -> in sample

Conditional probabilityRed Blue

Probability with

replacement

20 30

When a favorable outcome

P= # of favorable outcomes

Total # of Possibilities

= 4/52 (to get an ace)

P(E)= 0.08

P(/E)= 0.92 (the alternative) P(E) + P(/E)=1

0_


6/18

P(A or B)= P(A) + P(B)

P(A)+P(A)=1

n=50

Blood Types F

A 22

B 5

O 2AB 21

Probability of A or B

22 f

50 n

28

50 = A

Not mutually exclusive

A

B

P(A or B)= P(A) + P(B)- P(A and B)

A and B

Probability of selecting A doesnt effect the probability of selecting B

K Q

V V

4/52 4/52

Independent

Not Independent

K Q

4/52 4/51

P(A) AND P(B)= P(A) * P(B I A )

P(A) and P(B)= P(A) * P(B)

(K+Q)= (4/52) * (4/52)

P(AandB) = P(A) * P(BIA)p(A) P(A)

ABC

ACB

BAC

BCA

nPr= n!(n-r)!

(3-3)!= 0!

3x2x1

1 =6

Combinations

nCr= n!(n-r)! * r!

30! 30x29x28

(27)!* 3! = 3x2x1

Is it mutually exclusive

See if the are independent

Probability

Mean

Standard deviation

Standardize the distribution

Raw Data:

0

25

s-5

30

1s

20

-1s

10-20 5 15

20.5-30 10

Xm

12 13 15 17 20

Midpoint is the average of the values

f*Xm

Cv=sx



7/18

2x2x2=8

h

th

T h

t

h

t

h

t

h

t

h

t

Hhh

Hht

Hth

Htt

Thh

Tht

Tth

ttt

P()

1/8 or .125

Hhh hht hth thh htt tht tth ttt

1/8=0.125

3/8=.275 3/8=0.3751/8=0.125

3 2 1 o

h h h h

1 2 3

t t t

M= (x*P(x))

X P(x) x*p(x)

3 .125 .375

2 .375 .75

1 .375 .375

0 .125 0

1.5=

= (x-) *P(x)

(x-m) (x-) = (x-)

*P(x)

1.5 2.25 0.28

0.5 0.25 .09

-0.5 0.25 .09

.05 2.25 0.28

= 0.74

= 0.86

Binomial

Limited # of Trials= n1.

Only 2 possibilities2.

P(success) P(failure)3.

p + q =1

P(x)= nCr * p^x * q^n-x

= 3C2 * 0.5^2 * 0.5^1=3!

1!=3 * 0.25 * 0.5=.375

Three tosses= three trials

nCr= n!(n-r)!r!

Binomial distribution

p.760 apendix b9

Once you know n,p,q-

=n*p

=n*p*q

=X*P(x)

=[(x-)*P(x)]

=*(x)+P(x)-

Simple formula to find the variance

# on Balls 0 1 2 3 4

P(x) 1/5 .2 .2 .2 .2

= 0+0.2+0.4+0.6+0.8=2

=1+4+9+16=(30*2)=6-4=2 = 2=1.4

=x

n

=mean

2

=1.4

15

=2

-36 -26 -16 +16 +26 +36

.136 .34 .34 .136

.023 .023



8/18

STD Normal= x-

2-2

1.4

Z=x-

Mt Z*=x-

Area under the Normal Curve:

The closer we are to the mean the more accurate we are

If the sample size is 30 or more than the number of samples gets the same result

z= Xi-

Xi=+*z

50 +1 100

=15

0

.3413 .3413

=50-15*1

=.35

p.750

.04 column

Z column= 1

z=1.04

0 100

80

=14

Xi----> 80%

0

.3413

.3413

.30%

=80-14*.84

=8-11.76

=68.24

z= -3 -2 -1 0 1 2 3

38 52 66 80 94 108 122

X=+*z

Area under the graph



9/18

0 1.8

42%

z=x-

Find the area between z=0 and z=1.8

z=-2.48 + z=-0.83

-2.48 -.83 0

I

-.4934

+.2967

.1967

=19.67%

-.4934+.2967=-0.1967

Its not possible sometimes , so you take a representative area and come to a conclusion of the

population.

-x= Bias, sampling error

x

Why a sample other than entire population?

x3 x1 x2 x4

x

The mean of the means of sample is

always uniform, you always end up with

normal distribution. The result is better

the higher number of samples

# samples

(1)pop

(2) Bias V

How do we find the area under this?!

Area under curve is at least equal to:

=1- (1/k)

2=1-(1/4)= 3/4=0.75 =75%

Chebysheu:k>1

z= Xi-

/n



10/18

Stratified sampling:

100 150 250 600

n=n1 + n2 + n3

X is estimator of

Unbiased1)

Consistent - as n^ Bias v2)

Efficient - smallest "s" ->3)

X

V

.95

z=2

Confidence Level @95%

z=x-

z*=x

z=Xi-

/n

z*(/n)=Xi--2.6 +2.6

interval

= 1 conf. leve l

= 1-.95=0.05

X-Z(/2)* (/n)


11/18

pop norma y

not known

N30

Z

t

Df

11 11

13 14

50 50

x=10

-

As df t distribution approaches Z

distribution-obviously because you are

approaching 30

UD students drink beer

Copernicus heliocentric model

Could be non numerical-

Hypothesis: some statement about some population parameterHypothesis Testing

350

=15

Ho:=k (350)

Null Hypothesis:

H1=k

Alternate Hypothesis:

Confidence level-

-

Level of significance

If the value is >x> then it is rejected

Test Value= X -> z

-1.96

v

321

vz=critical value

=0.025

=0.025

Rejection zone

+1.96

v

371

Do not

reject

0

350

325

Left Tailed

Ho:>K H1: n=35

x=25,226

=3,251

=0.01

Ho:=24672 H1:24672

Is the number significantly different ?

Cv= 2.58

TV=x-

/n



12/18

CV CV

-2.58 +2.58

=25226-24672

3251/35

= 1.01

n

1.01

-2.262 +2.262

The average starting salary for a nurse i s $2400

=$24,000

n=10

x=23,450

s=400

=0.05

Ho:=24000 H1:24000

CV= t=

Df=(n-1)= 9

TV= 23450-24000

400/10 = -4.35

.5- .025=0.475-4.35

Rejected!

n=30

x=43,260

=5,230

= 0.05

Ho:42,000

z=+1.65 =CV

TV= 43260-42000

5230/30 =1.32

The average salary of an assistant professor > $42,000

CV

1.65

1.32

0

Rejection zone

II

Ho false

Do not

rejecterror

-One sided example

The average price of shoe80 H1: 0.10

n=28 , Ho: m23, =.05, df=27, CVt=1.703, TV=4.5

29 24 24 .05 28 1.701 1.88

27 25 25 .1 26 1.315 1.84

Reject Ho

Quiz answers



13/18

TV, 1.55

CV

2.33

P= P(TV> Pi)

When is Ho true?

z= 1.55 = -0.4392

p .0608

= 0.05

>P I Ho is true - Reject

P> I Ho is true - Do Not Reject

Qd=a-b*Price

Y=a+b*x

Infl=a+B*M1

Housing starts= a-b* Mortgage Rate

Y X

Y1 X1 Stationary(within 1 year timeframe)

Y2 X2

The Project:

Minimum

Pairs

Select any two series where the independent variable impacts the

dependent variable

Burro of labor statistics

Cars sold 1 yr

15k 16k 17k 18k 19k 20k Price of car

DONT DO A TIME SERIES

HupoData

Mortgage rates effect housing start

Source:

[email protected]

Simple linear regression in excel

Find two variables: cause and effect- number of classes missed and grade achieved

Interest rate goes up borrowing goes down

For one specific year

One line saying what im trying to relate

Give source of data Appendix A

Testing how close our sample mean is to population mean

Compare two sample means: the sample means are independent of each other, populations are normally distributed

Your IQ before stat class and after

Ho:1=2 or 1-2=

(x1-x2)=2 1 + 2 2

n1 n2

H1: 12 or 1-2 z= (observed value- expected value)

2 1 + 2 2

n1 n2

0

The average price of a hotel room in

Dall as= $88.42, n1=50 s1=$5.62

Denton= $80.61 n2=50 s2= $4.38

=0.05

Cv cv

-1.96 +1.96 7.45

tv

Rejection zone



14/18

Test Value= (88.42-80.61) = 7.45

5.62 + 4.83

50 50

# of sports for boys=8.6 n1=50 S1=3.3

# of sports for girl s =7.9 n2=50 S2=3.3

Ho:12

=0.10

z=8.6-7.9

(3.3/50)+(3.3/50)

=1.06=0.3554

P=.5-.3554=.1446

In p value you always

contest the aternate

hypothesis

Find the P value

.3554

(x1-x2)-z/2 Sp= (n-1)s1 + (n2-1)S21)

n1+n2-2

t= .(x1-x2)2)

Sp(1/n1+1/n2)

Df=1 Df=9

Df=49

Df=49

Find Right->=0.1/2= 0.5 -> 36.415

Left -> 0.95 ---->13.848

=0.1 Conf=0.9

n=25 df=24

STD DEV

(n-1)s


15/18

z=X-

/n

X-Z/2*/n2H1:1


16/18

0 71

95% confidence interval

Standard deviation= 1.6 mgs

=0.05

n=19

(n-1)*s

18*1.6=46.08

46.08 2 1


17/18

1 1

Ho:1=2

H1:12

non-smokers n2=18 s2=10

1 CV

=0.1

/2=0.05

Test value-

F=36/10=3.6

>24 (25)

V

17

2.19 3.6

Not Reject

2.19

Project: plug data into excel

then select the scatter

function

Higher divergence in church 1 vs church 2

Ho:1 1

Always look at the right tailed test

Null must be less/more than orequal to

Variation of joggers in US vs africa

Whether = or

Level of significance/2

If hypothesis is correct use the right table

-Linear relationship

Correlation

Perfectly elastic claim

Completely inelastic

-1 0 1

r-sample -correlation

We want negative correlation or positive correlation:

r= (x-x)*(y-y) = n(x*y))-(x)*(y)

(n-1)*Sx*Sy [n(x2)-(x)2]*[n(y2)-(y)2]

-1 0 1

H0:=0

H1:0

tv= r*n-2 = r* n-2

1-r2 1-r2

Number of absence (X) Grade (Y)

6 82

2 86

15 43

9 74

X*Y X2 Y2

492 36 6724

172 4 7396

645 225 1849

666 81 5476

100

75

50



18/18

12 58

5 90

8 78

X=57 Y=511

696 144 3364

450 25 8100

624 64 6084

=3745 =579 =38993

25

0 5 10 15

r=7*3745 - 57*511

[7*579-(57)2]*[7*38993-(511)2]

=-0.944

=0.1

Cv=2.015

tv=.944 * [5/(1-.9442)

=-6.36

Date post:	06-Apr-2018
Category:	Documents
Upload:	udecon
View:	225 times
Download:	0 times

Statistical Methods [Jadhav]

Documents