+ All Categories
Home > Documents > 240 9 Variance Power

240 9 Variance Power

Date post: 08-Jul-2018
Category:
Upload: paba
View: 214 times
Download: 0 times
Share this document with a friend

of 23

Transcript
  • 8/19/2019 240 9 Variance Power

    1/58

    1

  • 8/19/2019 240 9 Variance Power

    2/58

    2

    Distribution of British birds mapped very closely to pubs – sampling bias.

    Likewise, size of net mesh greatly affects the size of sampled fish

  • 8/19/2019 240 9 Variance Power

    3/58

    Model = analysis

    3

  • 8/19/2019 240 9 Variance Power

    4/58

    4

    Each requires its own set of statistical techniques therefore it is important toknow what you are dealing with.

  • 8/19/2019 240 9 Variance Power

    5/58

    5

    Continuous, Discrete or Categorical data? The type of data determines the thetype of test – i.e. can’t talk in terms of “average:” slug colour.

  • 8/19/2019 240 9 Variance Power

    6/58

    6

  • 8/19/2019 240 9 Variance Power

    7/58

    7

  • 8/19/2019 240 9 Variance Power

    8/58

    8

  • 8/19/2019 240 9 Variance Power

    9/58

    9

  • 8/19/2019 240 9 Variance Power

    10/58

    10

    We’ll now focus on this last point.

  • 8/19/2019 240 9 Variance Power

    11/58

    11

  • 8/19/2019 240 9 Variance Power

    12/58

    12

  • 8/19/2019 240 9 Variance Power

    13/58

    Growth rate = “realized r” (which we have already discussed)

    Here in a statistical analysis context, “R” is entirely different. Here we are

    referring to the amount of variation explained by the BEST FIT LINE betweenof the two variables – which in this case is 23% (23% of the variation in moosepopulation growth rate is attributable to the mean age of moose in the

    population while the remaining 77% of variation is caused by other factors(which includes random chance). If age, and only age, explained the

    differences in population growth rate, all the data points would be on the lineand R2=1.0 (or 100%).

    Define “best fit line” in simple terms: A line that minimizes the average distance

    of each data point to the line. In this example we have chosen a “linearrelationship” (a straight line). Best fit lines can be a number of different shapes

    (logistic, exponential, normal etc.). Linear relationships are the simplest todescribe.

    13

  • 8/19/2019 240 9 Variance Power

    14/58

    14

  • 8/19/2019 240 9 Variance Power

    15/58

    15

    Confidence in the relationship is the degree to which points fall on the “best fit”line. Here 81% of the variation in fish abundance is explained by the size ofthe pothole.

    The average distance of points from the line is the “variation”

    Why are all points not on the line?Real, biological reasons but also artifacts of how we measure. We want to

    eliminate the latter as best we can, leaving only the former. We do this byoptimizing our experimental design (measure as many of the biological driers

    as possible) and use analysis tools that can tease these various drivers apart.

  • 8/19/2019 240 9 Variance Power

    16/58

     All of these plots have low variation but plots B,C,and D are not good modelsof the data.

    16

  • 8/19/2019 240 9 Variance Power

    17/58

    17

  • 8/19/2019 240 9 Variance Power

    18/58

    18

  • 8/19/2019 240 9 Variance Power

    19/58

    Was every grizzly bear enumerated? No. Therefore we must use samples toestimate the actual (aka parameter ) abundance of the population.

    This example is a simplified interpretation of the Artelle et al paper.

    19

  • 8/19/2019 240 9 Variance Power

    20/58

    Since these are estimates, we must also indicate the amount of likely variation(i.e. the error) associated with each.

    The line around the estimate of the mean (the dots) may be standard errors,

    standard deviations, or confidence intervals –each mean very different thingsand knowing those differences is critical in interpreting scientific data.

    20

  • 8/19/2019 240 9 Variance Power

    21/58

    Interpretation: There is a 95% chance the parameter (the actual, realabundance) lays within this interval. Conversely, there is a 5% chance theparameter value lays outside this interval (i.e. there is a 5% of being wrong)

     Another way to look at this, if the experiment were run 100 times, theabundance statistic would fall within this interval at least 95 times. The CI is an

    estimate of confidence in the statistic.

    21

  • 8/19/2019 240 9 Variance Power

    22/58

    22

  • 8/19/2019 240 9 Variance Power

    23/58

    Overlapping CIs indicate the real parameter value is likely captured by bothintervals and therefore statistically they cannot be said to be significantlydifferent. In other words, because each interval defines the range of values

    within which we are 95% certain the real value lies within, we have to acceptthe possibility that the real population abundance for both GOV and NGO are

    the same value.

    23

  • 8/19/2019 240 9 Variance Power

    24/58

    The far right estimate – it has the narrowest interval within which we are 95%certain the real value exists.

    24

  • 8/19/2019 240 9 Variance Power

    25/58

    25

  • 8/19/2019 240 9 Variance Power

    26/58

    26

    !.and therefore reflects the confidence in your parameter estimates! 

  • 8/19/2019 240 9 Variance Power

    27/58

    27

  • 8/19/2019 240 9 Variance Power

    28/58

    28

    Here the mean is 10 but of equal interest is the dispersion of points around“10”. The greater the dispersion (aka variance) the less meaningful the meanof 10 is.

    This plot shows the “bell curve” or more correctly, “the normal distribution”. In anormal distribution the dispersion of data points around the mean (i.e. in the

    positive and negative direction) is roughly equal and symmetric. As a result,when data are “normally distributed”, the mean, mode and median are thesame value (or very nearly so).

    Put another way: the number or extremely small individuals is roughly

    equivalent to the number of really large individuals.

  • 8/19/2019 240 9 Variance Power

    29/58

    These are all “normal” distributions.

    29

  • 8/19/2019 240 9 Variance Power

    30/58

    The number of species plotted for different abundance intervals, each intervalbeing twice the preceding one. The overall pattern is normally distributed. Aninteresting twist: The portion of the graph (red) left of Preston's veil line is

    theoretical, depicting those species that are expected to be present but theirlow abundance prevents them from being represented in the sample. Note the

    x-axis is on a log scale.

    30

  • 8/19/2019 240 9 Variance Power

    31/58

    From a very interesting paper that is directly relevant to your tutorial projects.

    The relevance here is that various factors that affect detectability are all

    normally distrusted around a mean value.

    31

  • 8/19/2019 240 9 Variance Power

    32/58

    32

    The power of the normal (aka Gaussian) distribution.

    With standard deviation we can estimate how dispersed the data are withoutactually sampling all individuals.

  • 8/19/2019 240 9 Variance Power

    33/58

    33

  • 8/19/2019 240 9 Variance Power

    34/58

    34

    Based on our samples, 95% of individuals from the sampled population will bebetween (mean-2SD) and (mean+2SD)

  • 8/19/2019 240 9 Variance Power

    35/58

    35

  • 8/19/2019 240 9 Variance Power

    36/58

    Here we can sample every individual in our population (i.e. we are not trying toinfer anything about ALL dogs, just the 5 dogs we have here). In this special,rare, case, we divide by n (which here is 5). In future examples, where we use

    samples to make inferences about a larger population (almost always thesituation we face), we would use n-1 as the denominator in the variation

    calculation (here would be 4).

    36

  • 8/19/2019 240 9 Variance Power

    37/58

    Having an estimate is only the start of an analysis. What we also need to knowis how precise that estimate is – this is an estimate of the “power” of theestimate or put another way, the confidence in the estimate.

    37

  • 8/19/2019 240 9 Variance Power

    38/58

    If two researchers measure the abundance of a particular bird species,

    38

  • 8/19/2019 240 9 Variance Power

    39/58

    The simplest of examples.

    39

  • 8/19/2019 240 9 Variance Power

    40/58

    What is an example hypothesis here?

    What is the null hypothesis (Ho)

    What is the alternative hypothesis (H A)

    What is the prediction?

    How many tosses are necessary? Lets say each toss (‘replicate’) costsmoney!here we introduce a real life consideration – data costs money (and/

    or time, risk etc).

    40

  • 8/19/2019 240 9 Variance Power

    41/58

    41

  • 8/19/2019 240 9 Variance Power

    42/58

    The answer is, the ability to detect a fixed coin increases with the number oftosses! 

    42

  • 8/19/2019 240 9 Variance Power

    43/58

     As sample size increases, the test becomes more precise (in other words, ourpower to detect a fixed coin increases)

    By increasing sample size you can overcome random runs of heads or tails

    because the confidence limits shrink. Low sample sizes may not allowconfidence limits small enough to differentiate overlapping distributions.

    Confidence interval is telling us that we are 95% certain that the real

    population mean is within these bounds. Recall that we are using a sample toinfer facts of the population.

    Deviation from 1:1 in heads:tails we can expect in a fair coin with 5% error

    2 tosses = 0 to 2

    10 tosses = 2 to 8

    50 tosses = 18 to 32

    100 tosses = 40 to 601000 tosses = 470 to 530

    43

  • 8/19/2019 240 9 Variance Power

    44/58

    Therefore the mean estimate is meaningless unless the variance around thatmean is also reported. Gov and NGO estimates may be dramatically differentor the same number !.depending on the variance. The take-home here that it

    is NOT the mean that should be focus, but the confidence interval because thereal value (parameter) can be anywhere in this interval.

    44

  • 8/19/2019 240 9 Variance Power

    45/58

    Remember that we are not talking aboutbiologically significanthere. Atissue is NOT whether the size of the difference is significant or not. It is thatsmaller average differences in size will be harder to detect (given there will be

    variation around the means) when the means are not greatly different.

    45

  • 8/19/2019 240 9 Variance Power

    46/58

    When the effect size is large enough and the experiment is well designed, youdon’t need statistics!.

    46

  • 8/19/2019 240 9 Variance Power

    47/58

    but more often the differences are more subtle and statistics are used todetermine if differences are large enough to be considered “significant” akascientifically valid.

    47

  • 8/19/2019 240 9 Variance Power

    48/58

    The two means could be VERY similar and the difference could still beconfidently detected even with modest sample size if the variance is very low.

    Think how low vs. high sample size may hurt or help differentiating the two

    populations on the lower panel

    48

  • 8/19/2019 240 9 Variance Power

    49/58

    Imagine one coin always landed “heads” 100% o the time. Ability to spot itrelative to a “normal” coin would be high (fewer tosses needed)

    Imagine one coin always landed “heads” and the other always landed “tails” –

    would need very few tosses to confidently declare both as “fixed”

    Now imagine one coin is fixed, but less so – it lands “heads” 60% of the time.

    How many tosses needed to detect it!?

    49

  • 8/19/2019 240 9 Variance Power

    50/58

     A priori or prospective power analysis – how many samples do I need to detecta difference at a given effect size

    post hoc or after the fact – can I detect a difference at a particular threshold

    given my sample size

    These considerations have very real economic implications in terms of time

    and money required to gather data.

    50

  • 8/19/2019 240 9 Variance Power

    51/58

    51

  • 8/19/2019 240 9 Variance Power

    52/58

    52

  • 8/19/2019 240 9 Variance Power

    53/58

    53

  • 8/19/2019 240 9 Variance Power

    54/58

    54

    "  “equal to or less than”

  • 8/19/2019 240 9 Variance Power

    55/58

    55

  • 8/19/2019 240 9 Variance Power

    56/58

    56

  • 8/19/2019 240 9 Variance Power

    57/58

    57

    Recall that standard deviation (SD) is the square root of variance.

  • 8/19/2019 240 9 Variance Power

    58/58

    95% Confidence Limits – The interval in which we are 95% certain the realmean resides.


Recommended