+ All Categories
Home > Documents > Sampling Distributions & Probabilitybme2.aut.ac.ir/~towhidkhah/MI/Resources/... · 4 Sampling...

Sampling Distributions & Probabilitybme2.aut.ac.ir/~towhidkhah/MI/Resources/... · 4 Sampling...

Date post: 04-Feb-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
15
1 Sampling Distributions & Probability McCall Chapter 3 measures of central tendency mean deviations @ the mean minimum variability of scores about the mean median mode McCall Chapter 3 measures of variability range variance standard deviation
Transcript
  • 1

    Sampling Distributions &Probability

    McCall Chapter 3

    measures of central tendency mean

    deviations @ the mean minimum variability of scores about the mean

    median mode

    McCall Chapter 3

    measures of variability range variance standard deviation

  • 2

    McCall Chapter 3

    populations and samples estimators of population parameters based on sample mean, variance

    McCall Chapter 7

    sampling sampling distributions sampling error probability & hypothesis testing estimation

    Methods of Sampling

    simple random sampling all elements of the population have an equal

    probability of being selected for the sample representative samples of all aspects of

    population (for large samples)

  • 3

    Methods of Sampling

    proportional stratified random sample mainly used for small samples random sampling within groups but not

    between e.g. political polls

    random sampling within each province (but not between provinces) total # samples for each province predetermined

    by overall population

    Random Sampling

    each subject is selected independentlyof other subjects

    selection of one element of the populationdoes not alter likelihood of selecting anyother element of the population

    Sampling in Practice

    elements of population available to besampled is often biased e.g. willingness of subjects to participate certain subjects sign up for certain experiments Psych 020 subject pool - is it representative of

    the general population?

  • 4

    Sampling Distributions

    sampling is an imprecise process estimate will never be exactly the same as

    population parameter a set of multiple estimates based on

    multiple samples is called an empiricalsampling distribution

    Empirical Sampling Distribution

    Sampling Distribution

    The distribution of a statistic determinedon separate independent samples of sizeN drawn from a given population is calleda sampling distribution

    mean, standard deviation and variance inraw score distributions vs samplingdistributions:

  • 5

    Sampling Distributions

    Estimates of population statistics by using the mean of a sample of raw scores we

    can estimate both: mean of sampling distribution of means mean of population raw scores

    we can estimate the standard deviation of thesampling distribution of means using: standard deviation of raw scores in sample divided by

    the square root of the size of the sample

    Standard error of the mean all that’s required to esimate it is

    standard deviation of sample of raw scores N (# of scores in sample)

    it represents an estimate of the amount ofvariability (or sampling error) in means from allpossible samples of size N from the populationof raw scores

    not necessary to select several samples toestimate the population sampling error of themean

  • 6

    Standard error of the mean we divide by the square root of N thus standard error of the mean is always smaller than the

    standard deviation of the raw scores in sample variability of means from sample to sample will always be

    smaller than the variability of raw scores within a sample as N becomes larger, standard error mean becomes

    smaller For large samples the mean will be less variable from

    sample to sample thus a more accurate estimate of the true mean of the

    population

    Normality Given random sampling, the sampling distribution

    of the mean is a normal distribution if the population distribution of

    the raw scores is normal approaches a normal distribution as the size of the

    sample increases even if the population distribution ofraw scores is not normal

    Central limit theorem the sum of a large number of independent observations from

    the same distribution has, under certain general conditions,an approximate normal distribution

    the approximation steadily improves as the number ofobservations increases

    Normality why do we care about whether populations or

    samples are normally distributed??? parametric statistical tests are based on the

    assumption of normality t-Test F-test

    given a mean and a variance: we can look up in a table (or compute) the proportion

    of population scores that fall above (or below) a givenvalue

    based on assuming entire shape of distribution basedon mean and variance

  • 7

    Normality What if assumption of normality is violated? perform “non-parametric” statistical tests

    we will see some of these later on in course not based on assuming any particular shape of

    distribution determine “how serious” the violation is

    monte-carlo simulations given 2 known non-normal distributions whose means do

    NOT differ, extract 1000 random samples of size N from each perform statistical tests with given alpha value (e.g. .05) was # of type-I errors greater than alpha?

    Hypothesis testing

    standard normal distribution (z);t distribution

    A single case

    suppose it is known: for population of subjects asked to learn &

    remember 15 nouns mean # correct nouns recalled after 80 min =

    7 & standard deviation = 2

    !

    µ = 7

    "x

    = 2

  • 8

    A single case

    does taking a new drug improve memory? test a single person after taking the drug they score 11 nouns recalled what can we conclude?

    !

    µ = 7

    "x

    = 2

    A single case 11 nouns recalled after taking drug what are the chances that someone randomly

    sampled from the population (without the drug)would have scored 11 or higher?

    this probability equals the area under the curve:

    !

    µ = 7

    "x

    = 2

    A single case to determine probability:

    convert raw score (11) to a z-score: Z = (11-7)/2 = 2.00 lookup probability in z-table (table A, appendix 2,

    McCall) p = 0.0228

    !

    µ = 7

    "x

    = 2

    !

    zi=Xi"µ

    x

    #x

  • 9

    A single case p=0.0228; but what is our alpha level? let’s say 5%

    but we didn’t know in advance whether drug shouldlower or raise memory score

    we have to split 5% into top 2.5% and bottom 2.5%

    !

    µ = 7

    "x

    = 2

    2.5% 2.5%

    A single case p=0.0228 alpha (two-tailed) = 0.0250 thus p < alpha; reject null hypothesis conclude drug may have an effect

    !

    µ = 7

    "x

    = 2

    2.5% 2.5%

    A single group let’s say the experimenter samples 20

    people randomly from the population andtests all of them

    average score across 20 subjects = 8.4 same idea as before - ask what is the

    probability that group mean of sample size20 could be 8.4 or above given onlyrandom sampling?

    i.e. assuming no effect of the drug

  • 10

    A single group sample mean (n=20) = 8.4 let’s transform into a z-score mean is mean of sampling distribution of

    means standard deviation is sd of sampling

    distribution of means (not of raw scores) thus

    z=(8.4-7.0)/(2.0/sqrt(20)) = 3.13

    !

    zi=Xi"µ

    x

    #x

    !

    z =X "µ

    x

    #x

    N

    A single group

    z = 3.13; lookup in table, p = 0.0009 so reject null hyp; drug may have an effect notice that now z-score and hence

    probability depend on sample size the higher the sample size the higher the z-

    score, & the lower the probability

    !

    z =X "µ

    x

    #x

    N

    Decisions

  • 11

    A single group

    in these examples, mean and standarddeviation of population was known

    typically we don’t know real s.d. but wehave to estimate it from sample data

    !

    z =X "µ

    x

    #x

    N

    !

    zi=Xi"µ

    x

    #x

    Tests based on estimates we can use the standard error of the sampling

    distribution of the mean to estimate standarddeviation of population

    accuracy depends on sample size, N for large samples N>50, N>100, it’s fairly accurate for small samples it is not another theoretical sampling distribution exists

    that is appropriate for these situations

    !

    z =X "µ

    #x

    !

    sx

    = sx/ N

    The t distribution

    similar to z distribution however: there is a different distribution for

    each sample size N df = N-1

    !

    t =X "µ

    sx

    =X "µ

    sx/ N

  • 12

    The t distribution

    same example as before select 20 subjects at random assume s.d. of population is not available assume population is normal in form with a

    mean of 7.0 test the hypothesis that:

    the observed sample mean is drawn from apopulation with mean=7.0

    The t distribution compute the t statistic: tobs = (8.4-7.0)/(2.3/sqrt(20)) = 2.72 lookup “critical value” of t for alpha = 0.05 and

    df=N-1 = 20-1 = 19 tcrit = 2.093 (two-tailed test) since tobs > tcrit, we reject the null hypothesis reject the hypothesis that sample was drawn from

    a population with mean=7.0 conclude drug may have had an effect

    !

    t =X "µ

    sx/ N

    Confidence Intervals for the mean

    our sample mean is not equal to thepopulation mean

    it is an estimate using standard error of the mean and the t-

    statistic we can compute a confidenceinterval for the true population mean

    !

    X ± t" (sx )

  • 13

    Confidence intervals for the mean Xbar = the sample mean alpha = 1 - level of confidence desired talpha = critical value of t corresponding to a nondirectional test

    at significance level alpha sxbar = the estimated standard error of the mean Xbar = 8.4 alpha = 1-.95 = 0.05 talpha = t.05 = 2.093 (from the t-tables, df=19) sxbar = (2.3/sqrt(20)) = 0.51 confidence limits are 8.4 +/- 1.07 7.33 9.47 the probability is 95% that the interval (7.33,9.47) contains the

    true population mean

    !

    X ± t" (sx )

    t-tests for difference between means

    assume we have two random samples we want to test whether these two

    samples have been drawn from h0: the same population (with the same mean) h1: 2 populations with 2 different means

    compute t-statistic:

    !

    t =(X

    1" X

    2) " (µ

    1"µ

    2)

    sx 1"x 2

    t-tests for difference between means under the null hypothesis, mean1=mean2

    denominator again can be estimated from thesample data

    standard error of the difference between meansactually varies depending on whether scores inthe two samples are correlated or independent

    !

    t =(X

    1" X

    2) " (µ

    1"µ

    2)

    sx 1"x 2

    =(X

    1" X

    2) " 0

    sx 1"x 2

    =(X

    1" X

    2)

    sx 1"x 2

  • 14

    independent groups t-test

    formula for computing t:

    !

    t =X 1" X

    2

    sx 1"x 2

    =X 1" X

    2

    (N1"1)s

    1

    2+ (N

    2"1)s

    2

    2

    N1

    + N2" 2

    #

    $ %

    &

    ' ( 1

    N1

    1

    N2

    #

    $ %

    &

    ' (

    !

    df = (N1"1) + (N

    2"1) = N

    1+ N

    2" 2

    correlated groups t-test

    assume the difference between twomeans equals the mean differencebetween pairs of scores

    formula for computing t:

    !

    t =D "µ

    D

    sD

    =D

    sD/ N

    !

    df = N "1

    !

    t =Di"

    N Di

    2# ( D

    i)2""

    N #1

    Interpretation of significance scientific significance and statistical significance

    are NOT the same if N is very large you might find a “statistically

    significant” difference between samples that is infact tiny and is not scientifically significant

    if N is very small you might falsely conclude theabsence of a scientifically significant differencethat is in fact present because the observeddifference in your samples is not “statisticallysignificant”

  • 15

    SPSS & t-tests

    independent samples t-test two groups: A: (3,1,5,4,6,2,1,8,3,4) B: (5,7,8,9,7,6,8,7,5,9)


Recommended