leon [email protected]

Date post:31-Dec-2015
Category:
View:19 times
Download:0 times
Share this document with a friend
Description:
eatworms.swmed.edu/~leon [email protected] Combining probabilities Samples and Populations Four useful statistics: The mean, or average. The median, or 50% value. Standard deviation. Standard Error of the Mean (SEM). Three distributions: The binomial distribution. - PowerPoint PPT Presentation
Transcript:
  • eatworms.swmed.edu/[email protected]

  • Basic StatisticsCombining probabilitiesSamples and PopulationsFour useful statistics:The mean, or average.The median, or 50% value.Standard deviation.Standard Error of the Mean (SEM).Three distributions:The binomial distribution.The Poisson distribution.The normal distribution.Four testsThe chi-squared goodness-of-fit test.The chi-squared test of independence.Students t-testThe Mann-Whitney U-test.

  • Combining probabilitiesWhen you throw a pair of dice, what is the probability of getting 11?

  • Combining probabilitiesThe probability that all of several independent events occurs is the product of the individual event probabilities.

    The probability that one of several mutually exclusive events occurs is the sum of the individual event probabilities.

  • Combining probabilitiesWhen you throw a pair of dice, what is the probability of getting 11?

    When you throw five dice, what is the probability that at least one shows a 6?

  • Combining probabilitiesWhen you throw a pair of dice, what is the probability of getting 11?

    When you throw five dice, what is the probability that at least one shows a 6?

  • Populations and samplesWhat proportion of the population is female?

  • Populations and samplesWhat proportion of the population is female?Abstract populations: what does a mouse weigh?

  • Populations and samplesWhat proportion of the population is female?Abstract populations: what does a mouse weigh?Population characteristics:Central tendency: mean, medianDispersion: standard deviation

  • Four sample statistics

    Sample mean:

    Sample median:

    is the middle value in a sample of odd size, the average of the two middle values in a sample of even size.

    Sample standard deviation:

    Standard Error of the Mean:

    _936962343.unknown

    _937035986.unknown

    _937036511.unknown

    _936962342.unknown

  • Standard deviation and SEMUse standard deviation to describe how much variation there is in a population.Example: income, if youre interested in how much income varies within the US population.Use SEM to say how accurate your estimate of a population mean is.Example: measurement of -gal activity from a 2-hybrid test.

  • Sample stats: recommendationsWhen you report an average, report it as meanSEM. Same for error bars in graphs.In the figure caption or the table heading or somewhere, say explicitly that thats what youre reporting.Use the median for highly skewed data.

  • Three distributionsThe binomial distributionWhen you count how many of a sample of fixed size have a certain characteristic.The Poisson distributionWhen you count how many times something happens, and there is no upper limit.The normal distributionWhen you measure something that doesnt have to be an integer or when you average several continuous measurements.

  • The binomial distribution

    When you count how many of a sample of fixed size have a certain characteristic.

    Parameters:N: the fixed sample sizep: the probability that one thing has the characteristicq: the probability that it doesnt: (1-p)

    Formula:

    Example:Females in a population, animals having a certain genetic characteristic.

    _937036628.unknown

    _1031483742.unknown

  • The Poisson distribution

    When you count how many times something happens, and there is no (or only a very large) upper limit.

    Parameter:(: the population mean

    Formula:

    Example:Radioactivity counts, positive clones in a library.

    _1031484037.unknown

  • The normal distribution

    When you measure a something that doesnt have to be an integer, e.g. weight of a mouse, or velocity of an enzyme reaction, and especially when you average several such continuous measurements.

    Parameters:(: the population mean

    : the population variance

    Formula:

    Example:Weight, heart rate, enzyme activity

    _1031484223.unknown

    _1031484231.unknown

  • Hypothesis testing

  • A genetic mapping problem

    Moms genotype:

    Dads genotype:

    At SSR:

    (/(

    (/(

    At disease locus:

    e/+

    e/+

    Assume we know that Mom inherited both the ( allele of the SSR and the e mutation from her father, and likewise that Dad inherited ( and e from his father.

    Suppose SSR and disease locus are unlinked (the null hypothesis). What is the probability that an epileptic (e/e) child has SSR genotype (/(?

  • A genetic mapping problem

    Moms genotype:

    Dads genotype:

    At SSR:

    (/(

    (/(

    At disease locus:

    e/+

    e/+

    Assume we know that Mom inherited both the ( allele of the SSR and the e mutation from her father, and likewise that Dad inherited ( and e from his father.

    Suppose SSR and disease locus are unlinked (the null hypothesis). What is the probability that an epileptic (e/e) child has SSR genotype (/(?

    Answer: 1/4

    Now suppose that SSR and disease locus are genetically linked. What is the probability that an epileptic (e/e) child has SSR genotype (/(?

  • A genetic mapping problem

    Moms genotype:

    Dads genotype:

    At SSR:

    (/(

    (/(

    At disease locus:

    e/+

    e/+

    Assume we know that Mom inherited both the ( allele of the SSR and the e mutation from her father, and likewise that Dad inherited ( and e from his father.

    Suppose SSR and disease locus are unlinked (the null hypothesis). What is the probability that an epileptic (e/e) child has SSR genotype (/(?

    Answer: 1/4

    Now suppose that SSR and disease locus are genetically linked. What is the probability that an epileptic (e/e) child has SSR genotype (/(?

    Answer: Something less than 1/4

  • The experimentLook at the SSR genotype of 40 e/e kids.If about 1/4 are /, the SSR is probably unlinked.If the number of / is much less than 1/4, the SSR is probably linked.Were going to figure out how to make the decision in advance, before we see the results.

  • Expected results if unlinked

    Chart3

    0.0000100566

    0.0001340878

    0.0008715707

    0.0036799652

    0.0113465595

    0.0272317428

    0.0529506109

    0.0857295605

    0.1178781457

    0.139707432

    0.1443643464

    0.1312403149

    0.1057213648

    0.0759025183

    0.048794476

    0.0281923639

    0.0146835229

    0.0069098931

    0.0029431026

    0.0011359343

    0.000397577

    0.0001262149

    0.0000363346

    0.0000094786

    0.000002238

    0.0000004774

    0.0000000918

    0.0000000159

    0.0000000025

    0.0000000003

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Pr(x)

    x

    Pr(x)

    Binomial, N=40, p=0.25

    Sheet1

    pN

    0.2540

    0

    xPr(x)Upper tailLower tail

    00.00001005660.00001005661

    10.00013408780.00014414440.9999899434

    20.00087157070.00101571510.9998558556

    30.00367996520.00469568030.9989842849

    40.01134655950.01604223980.9953043197

    50.02723174280.04327398260.9839577602

    60.05295061090.09622459350.9567260174

    70.08572956050.1819541540.9037754065

    80.11787814570.29983229970.818045846

    90.1397074320.43953973170.7001677003

    100.14436434640.5839040780.5604602683

    110.13124031490.71514439290.416095922

    120.10572136480.82086575770.2848556071

    130.07590251830.89676827590.1791342423

    140.0487944760.9455627520.1032317241

    150.02819236390.97375511590.054437248

    160.01468352290.98843863880.0262448841

    170.00690989310.99534853190.0115613612

    180.00294310260.99829163450.0046514681

    190.00113593430.99942756890.0017083655

    200.0003975770.99982514590.0005724311

    210.00012621490.99995136080.0001748541

    220.00003633460.99998769540.0000486392

    230.00000947860.9999971740.0000123046

    240.0000022380.9999994120.000002826

    250.00000047740.99999988950.000000588

    260.00000009180.99999998130.0000001105

    270.00000001590.99999999720.0000000187

    280.00000000250.99999999960.0000000028

    290.000000000310.0000000004

    30010

    31010

    32010

    33010

    34010

    35010

    36010

    37010

    38010

    39010

    40010

    0

    Sheet1

    Upper tail

    Lower tail

    Pr(x)

    x

    Pr(x)

    Tail probabilities

    Binomial, N=40, p=0.25

    Sheet2

    Pr(x)

    x

    Pr(x)

    Binomial, N=40, p=0.25

    Sheet3

  • Is the SSR linked?We want to know if the SSR is linked to the epilepsy gene.What would your answer be if:10/40 kids were /?0/40 kids were /?5/40 kids were /?Need a way to set the cut-off.

  • Type I errorsSuppose that in reality, the SSR and the epilepsy gene are unlinked.Still, by chance, the number of / in our sample may be
  • Whats the probability of a type I error () if we cut off at 5?

    Sheet1

    pN

    0.2500040

    0.0

    x0Pr(x = x0)Pr(x

  • Probability of a type I error

    Chart3

    0.00001005660.0000100566

    0.00014414440.0001340878

    0.00101571510.0008715707

    0.00469568030.0036799652

    0.01604223980.0113465595

    0.04327398260.0272317428

    0.09622459350.0529506109

    0.1819541540.0857295605

    0.29983229970.1178781457

    0.43953973170.139707432

    0.5839040780.1443643464

    0.71514439290.1312403149

    0.82086575770.1057213648

    0.89676827590.0759025183

    0.9455627520.048794476

    0.97375511590.0281923639

    0.98843863880.0146835229

    0.99534853190.0069098931

    0.99829163450.0029431026

    0.99942756890.0011359343

    0.99982514590.000397577

    0.99995136080.0001262149