+ All Categories
Home > Documents > Statistical Methods [Jadhav]

Statistical Methods [Jadhav]

Date post: 06-Apr-2018
Category:
Upload: udecon
View: 225 times
Download: 0 times
Share this document with a friend

of 18

Transcript
  • 8/2/2019 Statistical Methods [Jadhav]

    1/18

    -> government (laid off 125000)

    Stock market goes up when employment goes down-

    ->Private (gained 67000)

    V54000

    Look behind the numbersThey take a sample and use it to exempl ify the population

    Statistics

    -entitlements

    -services (highways, education etc)

    -[discretionary ]-> defense $600 Bill

    y= a+b*other exp

    (Receipts + Deficits)= Govt. Exp.

    Qualitative-

    0 1 2 3 - 100

    Discrete-

    2.1

    Continual-

    Quantitative-

    Data

    Example: beer bottles- every thousand are tested to make sure all have 12 oz

    Take every third or large section-

    Systematic sampling

    ?

    Every member of the universe has equal probability to get included into the sample-

    Ex.

    5'3''

    5'4''

    5'7''

    5'8''5'9''

    5'11''

    6'0''

    6'2''

    Random sampling

    Sample:

    The average does not have to be a number in the sample

    5'3''-> 5'7''5'8''->5'11''

    6'-> up

    The cross boundaries must me mutually exclusive.Frequency Midpoint Interval

    3 5'5''

    3 5'9''

    2 6'1''

    3x5'5''=16'5''16'4''

    3x5'9''=17'9''

    12'2''

    49'4''/8=average height in the class

    Class

    interval

    Scatter plot

    Presentation:

    5'3'' 6'2''

    Using raw data

    Histogram

    5'3'' 5'8'' 6' 5'5'' 5'9'' 6'1''

    average

    Frequency

    Ogive

    6"

    Morethan

    Less than

    5'3''-5'7''

    5'8''-5'11''

    Skewed

    Positive skew

    0 x

    Negative skew

    Statistical MethodsFriday, September 03, 2010

    11:00 AM

    Statistical Methods Page 1

  • 8/2/2019 Statistical Methods [Jadhav]

    2/18

    Presentation is best when there is a story that is si mple and tells exactly whats

    going on

    Slope:

    x

    y

    y= a- bx Arithmetic mean is in some cases higher

    than the geometric mean.

    Geometric mean =3 x1*x2*x3

    T f

    2009 120

    2008 110

    i.

    Time series (time on x axis data on y axis)1)

    n=20 fXm=490

    20.55-20.5

    Group with highest number of frequencies= mode

    Class interval Frequency (f)

    5.5-10.5 1

    10.55-15.5 2

    15.55-20.5 3

    20.55-25.5 5

    25.55-30.5 4

    30.55-35.5 3

    35.55-40.5 2

    Xm f*Xm

    8 8

    13 26

    18 54

    23 115

    28 112

    33 99

    38 76

    2)

    f=mode, xm= midpoint

    Two types of DATA:

    States in USA-Alabama

    Alaska

    Annual equilibrium

    fXm/ n

    =490/20

    =24.5

    How far is each value from the mean

    Its best to chose the distribution with the

    shorter range

    Because data is more uniform

    Range

    Dispersion

    5.5 24.5/x 40.4

    Ix-/xI

    s= (x-/x) / n

    = (x-) /N

    =

    STD Deviations= s

    Absolute value (every sign is positive)

    -3.5 -2.5 -5 5 25 35

    (x-/x)/ s

    68%

    95%

    99.7%

    Population mean ismore confident the

    further it is

    Variance =s

    =(fXm-[(fXm)/n])/ ni1

    Statistical Methods Page 3

  • 8/2/2019 Statistical Methods [Jadhav]

    4/18

    = [13310- (490)/20]/19

    68.7

    How many distributi ons lie beneath

    What is the average (the mean)

    How far are the values spread from it

    Standard deviation

    Ch1-4

    The chance that somethi ng is going to happen

    Probability

    Experiment: Results in an outcome

    Statistical Methods Page 4

  • 8/2/2019 Statistical Methods [Jadhav]

    5/18

    Sample Space

    Coin H/T

    1 2 3

    4 5 6

    DIE 1 -> 6

    All outcomes = Event

    2 1/6

    Even= Odd = 3/6 = 0.5

    Venn Diagram: # UD students

    100/ 3000 -> in sample

    Conditional probabilityRed Blue

    Probability with

    replacement

    20 30

    When a favorable outcome

    P= # of favorable outcomes

    Total # of Possibilities

    = 4/52 (to get an ace)

    P(E)= 0.08

    P(/E)= 0.92 (the alternative) P(E) + P(/E)=1

    0_

  • 8/2/2019 Statistical Methods [Jadhav]

    6/18

    P(A or B)= P(A) + P(B)

    P(A)+P(A)=1

    n=50

    Blood Types F

    A 22

    B 5

    O 2AB 21

    Probability of A or B

    22 f

    50 n

    28

    50 = A

    Not mutually exclusive

    A

    B

    P(A or B)= P(A) + P(B)- P(A and B)

    A and B

    Probability of selecting A doesnt effect the probability of selecting B

    K Q

    V V

    4/52 4/52

    Independent

    Not Independent

    K Q

    4/52 4/51

    P(A) AND P(B)= P(A) * P(B I A )

    P(A) and P(B)= P(A) * P(B)

    (K+Q)= (4/52) * (4/52)

    P(AandB) = P(A) * P(BIA)p(A) P(A)

    ABC

    ACB

    BAC

    BCA

    nPr= n!(n-r)!

    (3-3)!= 0!

    3x2x1

    1 =6

    Combinations

    nCr= n!(n-r)! * r!

    30! 30x29x28

    (27)!* 3! = 3x2x1

    Is it mutually exclusive

    See if the are independent

    Probability

    Mean

    Standard deviation

    Standardize the distribution

    Raw Data:

    0

    25

    s-5

    30

    1s

    20

    -1s

    10-20 5 15

    20.5-30 10

    Xm

    12 13 15 17 20

    Midpoint is the average of the values

    f*Xm

    Cv=sx

    Statistical Methods Page 6

  • 8/2/2019 Statistical Methods [Jadhav]

    7/18

    2x2x2=8

    h

    th

    T h

    t

    h

    t

    h

    t

    h

    t

    h

    t

    Hhh

    Hht

    Hth

    Htt

    Thh

    Tht

    Tth

    ttt

    P()

    1/8 or .125

    Hhh hht hth thh htt tht tth ttt

    1/8=0.125

    3/8=.275 3/8=0.3751/8=0.125

    3 2 1 o

    h h h h

    1 2 3

    t t t

    M= (x*P(x))

    X P(x) x*p(x)

    3 .125 .375

    2 .375 .75

    1 .375 .375

    0 .125 0

    1.5=

    = (x-) *P(x)

    (x-m) (x-) = (x-)

    *P(x)

    1.5 2.25 0.28

    0.5 0.25 .09

    -0.5 0.25 .09

    .05 2.25 0.28

    = 0.74

    = 0.86

    Binomial

    Limited # of Trials= n1.

    Only 2 possibilities2.

    P(success) P(failure)3.

    p + q =1

    P(x)= nCr * p^x * q^n-x

    = 3C2 * 0.5^2 * 0.5^1=3!

    1!=3 * 0.25 * 0.5=.375

    Three tosses= three trials

    nCr= n!(n-r)!r!

    Binomial distribution

    p.760 apendix b9

    Once you know n,p,q-

    =n*p

    =n*p*q

    =X*P(x)

    =[(x-)*P(x)]

    =*(x)+P(x)-

    Simple formula to find the variance

    # on Balls 0 1 2 3 4

    P(x) 1/5 .2 .2 .2 .2

    = 0+0.2+0.4+0.6+0.8=2

    =1+4+9+16=(30*2)=6-4=2 = 2=1.4

    =x

    n

    =mean

    2

    =1.4

    15

    =2

    -36 -26 -16 +16 +26 +36

    .136 .34 .34 .136

    .023 .023

    Statistical Methods Page 7

  • 8/2/2019 Statistical Methods [Jadhav]

    8/18

    STD Normal= x-

    2-2

    1.4

    Z=x-

    Mt Z*=x-

    Area under the Normal Curve:

    The closer we are to the mean the more accurate we are

    If the sample size is 30 or more than the number of samples gets the same result

    z= Xi-

    Xi=+*z

    50 +1 100

    =15

    0

    .3413 .3413

    =50-15*1

    =.35

    p.750

    .04 column

    Z column= 1

    z=1.04

    0 100

    80

    =14

    Xi----> 80%

    0

    .3413

    .3413

    .30%

    =80-14*.84

    =8-11.76

    =68.24

    z= -3 -2 -1 0 1 2 3

    38 52 66 80 94 108 122

    X=+*z

    Area under the graph

    Statistical Methods Page 8

  • 8/2/2019 Statistical Methods [Jadhav]

    9/18

    0 1.8

    42%

    z=x-

    Find the area between z=0 and z=1.8

    z=-2.48 + z=-0.83

    -2.48 -.83 0

    I

    -.4934

    +.2967

    .1967

    =19.67%

    -.4934+.2967=-0.1967

    Its not possible sometimes , so you take a representative area and come to a conclusion of the

    population.

    -x= Bias, sampling error

    x

    Why a sample other than entire population?

    x3 x1 x2 x4

    x

    The mean of the means of sample is

    always uniform, you always end up with

    normal distribution. The result is better

    the higher number of samples

    # samples

    (1)pop

    (2) Bias V

    How do we find the area under this?!

    Area under curve is at least equal to:

    =1- (1/k)

    2=1-(1/4)= 3/4=0.75 =75%

    Chebysheu:k>1

    z= Xi-

    /n

    Statistical Methods Page 9

  • 8/2/2019 Statistical Methods [Jadhav]

    10/18

    Stratified sampling:

    100 150 250 600

    n=n1 + n2 + n3

    X is estimator of

    Unbiased1)

    Consistent - as n^ Bias v2)

    Efficient - smallest "s" ->3)

    X

    V

    .95

    z=2

    Confidence Level @95%

    z=x-

    z*=x

    z=Xi-

    /n

    z*(/n)=Xi--2.6 +2.6

    interval

    = 1 conf. leve l

    = 1-.95=0.05

    X-Z(/2)* (/n)

  • 8/2/2019 Statistical Methods [Jadhav]

    11/18

    pop norma y

    not known

    N30

    Z

    t

    Df

    11 11

    13 14

    50 50

    x=10

    -

    As df t distribution approaches Z

    distribution-obviously because you are

    approaching 30

    UD students drink beer

    Copernicus heliocentric model

    Could be non numerical-

    Hypothesis: some statement about some population parameterHypothesis Testing

    350

    =15

    Ho:=k (350)

    Null Hypothesis:

    H1=k

    Alternate Hypothesis:

    Confidence level-

    -

    Level of significance

    If the value is >x> then it is rejected

    Test Value= X -> z

    -1.96

    v

    321

    vz=critical value

    =0.025

    =0.025

    Rejection zone

    +1.96

    v

    371

    Do not

    reject

    0

    350

    325

    Left Tailed

    Ho:>K H1: n=35

    x=25,226

    =3,251

    =0.01

    Ho:=24672 H1:24672

    Is the number significantly different ?

    Cv= 2.58

    TV=x-

    /n

    Statistical Methods Page 11

  • 8/2/2019 Statistical Methods [Jadhav]

    12/18

    CV CV

    -2.58 +2.58

    =25226-24672

    3251/35

    = 1.01

    n

    1.01

    -2.262 +2.262

    The average starting salary for a nurse i s $2400

    =$24,000

    n=10

    x=23,450

    s=400

    =0.05

    Ho:=24000 H1:24000

    CV= t=

    Df=(n-1)= 9

    TV= 23450-24000

    400/10 = -4.35

    .5- .025=0.475-4.35

    Rejected!

    n=30

    x=43,260

    =5,230

    = 0.05

    Ho:42,000

    z=+1.65 =CV

    TV= 43260-42000

    5230/30 =1.32

    The average salary of an assistant professor > $42,000

    CV

    1.65

    1.32

    0

    Rejection zone

    II

    Ho false

    Do not

    rejecterror

    -One sided example

    The average price of shoe80 H1: 0.10

    n=28 , Ho: m23, =.05, df=27, CVt=1.703, TV=4.5

    29 24 24 .05 28 1.701 1.88

    27 25 25 .1 26 1.315 1.84

    Reject Ho

    Quiz answers

    Statistical Methods Page 12

  • 8/2/2019 Statistical Methods [Jadhav]

    13/18

    TV, 1.55

    CV

    2.33

    P= P(TV> Pi)

    When is Ho true?

    z= 1.55 = -0.4392

    p .0608

    = 0.05

    >P I Ho is true - Reject

    P> I Ho is true - Do Not Reject

    Qd=a-b*Price

    Y=a+b*x

    Infl=a+B*M1

    Housing starts= a-b* Mortgage Rate

    Y X

    Y1 X1 Stationary(within 1 year timeframe)

    Y2 X2

    The Project:

    Minimum

    Pairs

    Select any two series where the independent variable impacts the

    dependent variable

    Burro of labor statistics

    Cars sold 1 yr

    15k 16k 17k 18k 19k 20k Price of car

    DONT DO A TIME SERIES

    HupoData

    Mortgage rates effect housing start

    Source:

    [email protected]

    Simple linear regression in excel

    Find two variables: cause and effect- number of classes missed and grade achieved

    Interest rate goes up borrowing goes down

    For one specific year

    One line saying what im trying to relate

    Give source of data Appendix A

    Testing how close our sample mean is to population mean

    Compare two sample means: the sample means are independent of each other, populations are normally distributed

    Your IQ before stat class and after

    Ho:1=2 or 1-2=

    (x1-x2)=2 1 + 2 2

    n1 n2

    H1: 12 or 1-2 z= (observed value- expected value)

    2 1 + 2 2

    n1 n2

    0

    The average price of a hotel room in

    Dall as= $88.42, n1=50 s1=$5.62

    Denton= $80.61 n2=50 s2= $4.38

    =0.05

    Cv cv

    -1.96 +1.96 7.45

    tv

    Rejection zone

    Statistical Methods Page 13

  • 8/2/2019 Statistical Methods [Jadhav]

    14/18

    Test Value= (88.42-80.61) = 7.45

    5.62 + 4.83

    50 50

    # of sports for boys=8.6 n1=50 S1=3.3

    # of sports for girl s =7.9 n2=50 S2=3.3

    Ho:12

    =0.10

    z=8.6-7.9

    (3.3/50)+(3.3/50)

    =1.06=0.3554

    P=.5-.3554=.1446

    In p value you always

    contest the aternate

    hypothesis

    Find the P value

    .3554

    (x1-x2)-z/2 Sp= (n-1)s1 + (n2-1)S21)

    n1+n2-2

    t= .(x1-x2)2)

    Sp(1/n1+1/n2)

    Df=1 Df=9

    Df=49

    Df=49

    Find Right->=0.1/2= 0.5 -> 36.415

    Left -> 0.95 ---->13.848

    =0.1 Conf=0.9

    n=25 df=24

    STD DEV

    (n-1)s

  • 8/2/2019 Statistical Methods [Jadhav]

    15/18

    z=X-

    /n

    X-Z/2*/n2H1:1

  • 8/2/2019 Statistical Methods [Jadhav]

    16/18

    0 71

    95% confidence interval

    Standard deviation= 1.6 mgs

    =0.05

    n=19

    (n-1)*s

    18*1.6=46.08

    46.08 2 1

  • 8/2/2019 Statistical Methods [Jadhav]

    17/18

    1 1

    Ho:1=2

    H1:12

    non-smokers n2=18 s2=10

    1 CV

    =0.1

    /2=0.05

    Test value-

    F=36/10=3.6

    >24 (25)

    V

    17

    2.19 3.6

    Not Reject

    2.19

    Project: plug data into excel

    then select the scatter

    function

    Higher divergence in church 1 vs church 2

    Ho:1 1

    Always look at the right tailed test

    Null must be less/more than orequal to

    Variation of joggers in US vs africa

    Whether = or

    Level of significance/2

    If hypothesis is correct use the right table

    -Linear relationship

    Correlation

    Perfectly elastic claim

    Completely inelastic

    -1 0 1

    r-sample -correlation

    We want negative correlation or positive correlation:

    r= (x-x)*(y-y) = n(x*y))-(x)*(y)

    (n-1)*Sx*Sy [n(x2)-(x)2]*[n(y2)-(y)2]

    -1 0 1

    H0:=0

    H1:0

    tv= r*n-2 = r* n-2

    1-r2 1-r2

    Number of absence (X) Grade (Y)

    6 82

    2 86

    15 43

    9 74

    X*Y X2 Y2

    492 36 6724

    172 4 7396

    645 225 1849

    666 81 5476

    100

    75

    50

    Statistical Methods Page 17

  • 8/2/2019 Statistical Methods [Jadhav]

    18/18

    12 58

    5 90

    8 78

    X=57 Y=511

    696 144 3364

    450 25 8100

    624 64 6084

    =3745 =579 =38993

    25

    0 5 10 15

    r=7*3745 - 57*511

    [7*579-(57)2]*[7*38993-(511)2]

    =-0.944

    =0.1

    Cv=2.015

    tv=.944 * [5/(1-.9442)

    =-6.36


Recommended