T HE ‘N ORMAL ’ D ISTRIBUTION. O BJECTIVES Review the Normal Distribution Properties of the...

transcript

THE ‘NORMAL’ DISTRIBUTION

OBJECTIVES

Review the Normal Distribution Properties of the Standard Normal

Distribution Review the Central Limit Theorem Use Normal Distribution in an inferential

fashion

THEORETICAL DISTRIBUTION Empirical distributions

based on data Example: empirical

distribution for a bootstrapped regression coefficient

Theoretical distribution based on mathematics

derived from model or estimated from data

Example: Standard Normal

THE NORMAL DISTRIBUTION

What is it? Why do we care?

The important thing is that distributions are tied to probabilities, and it is the probability which will be of interest to us

If we know something about the distribution of events, then we can estimate the likelihood of our particular event of interest (data)

WHAT’S THE BIG DEAL WITH THE NORMAL ONE? We believe that the variables of interest to us

are normally distributed in the population This may actually be a rather bold assumption

See Micerri, Wilcox Assuming a normal distribution allows us to

take advantage of its properties and make inferences from our sample to the population

The theoretical sampling distribution of various statistics do seem to be normally distributed Central limit theorem regards the sampling

distribution Most of the stats we use have normality as an

assumption in some form Though many researchers misunderstand it1,2

NORMAL PROBABILITY DISTRIBUTION

Symmetrical, bell-shaped curve Also known as Gaussian distribution Point of inflection = 1 standard deviation

from mean This is, despite what some seem to think,

all a ‘normal’ distribution is: a continuous probability distribution

f (X )1

NORMAL PROBABILITY DISTRIBUTION Since we know the shape of the curve, we

can (using calculus) calculate the area under the curve

The percentage of that area can be used to determine the probability that a given value could be pulled from a given distribution The area under the curve tells us about the

probability- in other words we can obtain an observed p-value for our result (data) by treating it as a normally distributed outcome

Issue: Each normal distribution with its own values of

and would need its own calculation of the area under various points on the curve

NORMAL PROBABILITY DISTRIBUTIONSTANDARD NORMAL DISTRIBUTION – N(0,1)

We often use the standard normal distribution as a result“Bell-shaped”Mean of 0

Standard deviation of 1

Possesses an infinite

number of possible values.

NORMAL PROBABILITY DISTRIBUTION The probability of any

one of those values occurring is essentially zero (but never quite)

Curve has a total area or probability = 1

For normal distributions+ 1 SD ~ 68%+ 2 SD ~ 95%+ 3 SD ~ 99.9%

Note: not all bell shaped symmetrical distributions are normal distributions

NORMAL DISTRIBUTION

The standard normal distribution will allow us to make claims about the probabilities of values related to our own data

How do we apply the standard normal distribution to our data?

Z-SCORE

If we know the population mean and population standard deviation, for any value of X we can compute a z-score by subtracting the population mean and dividing the result by the population standard deviation

IMPORTANT Z-SCORE INFO

Z-score tells us how far above or below the mean a value is in terms of standard deviations

It is a linear transformation1 of the original scores Multiplication (or division) of and/or addition to

(or subtraction from) X by a constant Relationship of the observations to each other

remains the same Z = (X-)/ X = Z +

EXAMPLE: GRE Say we have GRE scores (Verbal) that are

normally distributed with mean 500 and standard deviation 100.1

Find the probability that a randomly selected GRE score is greater than 620.

We want to know what’s the probability of getting a score 620 or beyond.

p(z > 1.2) Result: The probability of randomly getting a

score of 620 or greater is ~.12

620 5001.2

EXTENSION: STANDARD SCORES

Often units based on z-scores are presented instead of the z-score itself

First convert whatever score you have to a z score. Then:

New score = new s.d.(z) + new mean

Example- T scores = mean of 50 s.d. 10Then T = 10(z) + 50

Examples of standard scores: IQ, GRE, SAT

EXTENSION: INTERVAL ESTIMATES With the standard normal we can create interval

estimates for particular scores of interest Note that Howell’s wording on p.77 is not typically

how we are using confidence intervals and would be incorrect unless we are dealing with the population of scores (which he is in his example) The reason is that our methods provide one of an

infinite number of CIs x% of which ‘capture’ the parameter.

Our typical methods assume a fixed parameter and ‘random intervals’, not a fixed interval into which a random parameter might fall.

However the formula for an interval estimate there is one you’ll see a lot of variations on

sdx 96.1

SUMMARY NORMAL DISTRIBUTION

Assuming our data is normally distributed allows for us to use the properties of the normal distribution to assess the likelihood of some outcome

This gives us a means by which to determine whether we might think one hypothesis is more plausible than another (even if we don’t get a direct likelihood of either hypothesis)

T HE ‘N ORMAL ’ D ISTRIBUTION. O BJECTIVES Review the Normal Distribution Properties of the...

Documents