Post on 31-Mar-2015
transcript
THE ‘NORMAL’ DISTRIBUTION
OBJECTIVES
Review the Normal Distribution Properties of the Standard Normal
Distribution Review the Central Limit Theorem Use Normal Distribution in an inferential
fashion
THEORETICAL DISTRIBUTION Empirical distributions
based on data Example: empirical
distribution for a bootstrapped regression coefficient
Theoretical distribution based on mathematics
derived from model or estimated from data
Example: Standard Normal
THE NORMAL DISTRIBUTION
What is it? Why do we care?
The important thing is that distributions are tied to probabilities, and it is the probability which will be of interest to us
If we know something about the distribution of events, then we can estimate the likelihood of our particular event of interest (data)
WHAT’S THE BIG DEAL WITH THE NORMAL ONE? We believe that the variables of interest to us
are normally distributed in the population This may actually be a rather bold assumption
See Micerri, Wilcox Assuming a normal distribution allows us to
take advantage of its properties and make inferences from our sample to the population
The theoretical sampling distribution of various statistics do seem to be normally distributed Central limit theorem regards the sampling
distribution Most of the stats we use have normality as an
assumption in some form Though many researchers misunderstand it1,2
NORMAL PROBABILITY DISTRIBUTION
Symmetrical, bell-shaped curve Also known as Gaussian distribution Point of inflection = 1 standard deviation
from mean This is, despite what some seem to think,
all a ‘normal’ distribution is: a continuous probability distribution
f (X )1
2(e)
(X )2
2 2
NORMAL PROBABILITY DISTRIBUTION Since we know the shape of the curve, we
can (using calculus) calculate the area under the curve
The percentage of that area can be used to determine the probability that a given value could be pulled from a given distribution The area under the curve tells us about the
probability- in other words we can obtain an observed p-value for our result (data) by treating it as a normally distributed outcome
Issue: Each normal distribution with its own values of
and would need its own calculation of the area under various points on the curve
NORMAL PROBABILITY DISTRIBUTIONSTANDARD NORMAL DISTRIBUTION – N(0,1)
We often use the standard normal distribution as a result“Bell-shaped”Mean of 0
Standard deviation of 1
Possesses an infinite
number of possible values.
NORMAL PROBABILITY DISTRIBUTION The probability of any
one of those values occurring is essentially zero (but never quite)
Curve has a total area or probability = 1
For normal distributions+ 1 SD ~ 68%+ 2 SD ~ 95%+ 3 SD ~ 99.9%
Note: not all bell shaped symmetrical distributions are normal distributions
NORMAL DISTRIBUTION
The standard normal distribution will allow us to make claims about the probabilities of values related to our own data
How do we apply the standard normal distribution to our data?
Z-SCORE
If we know the population mean and population standard deviation, for any value of X we can compute a z-score by subtracting the population mean and dividing the result by the population standard deviation
zX
IMPORTANT Z-SCORE INFO
Z-score tells us how far above or below the mean a value is in terms of standard deviations
It is a linear transformation1 of the original scores Multiplication (or division) of and/or addition to
(or subtraction from) X by a constant Relationship of the observations to each other
remains the same Z = (X-)/ X = Z +
EXAMPLE: GRE Say we have GRE scores (Verbal) that are
normally distributed with mean 500 and standard deviation 100.1
Find the probability that a randomly selected GRE score is greater than 620.
We want to know what’s the probability of getting a score 620 or beyond.
p(z > 1.2) Result: The probability of randomly getting a
score of 620 or greater is ~.12
620 5001.2
100z
EXTENSION: STANDARD SCORES
Often units based on z-scores are presented instead of the z-score itself
First convert whatever score you have to a z score. Then:
New score = new s.d.(z) + new mean
Example- T scores = mean of 50 s.d. 10Then T = 10(z) + 50
Examples of standard scores: IQ, GRE, SAT
EXTENSION: INTERVAL ESTIMATES With the standard normal we can create interval
estimates for particular scores of interest Note that Howell’s wording on p.77 is not typically
how we are using confidence intervals and would be incorrect unless we are dealing with the population of scores (which he is in his example) The reason is that our methods provide one of an
infinite number of CIs x% of which ‘capture’ the parameter.
Our typical methods assume a fixed parameter and ‘random intervals’, not a fixed interval into which a random parameter might fall.
However the formula for an interval estimate there is one you’ll see a lot of variations on
sdx 96.1
SUMMARY NORMAL DISTRIBUTION
Assuming our data is normally distributed allows for us to use the properties of the normal distribution to assess the likelihood of some outcome
This gives us a means by which to determine whether we might think one hypothesis is more plausible than another (even if we don’t get a direct likelihood of either hypothesis)