+ All Categories
Home > Documents > Intro to Statistics for the Behavioral Sciences PSYC 1900

Intro to Statistics for the Behavioral Sciences PSYC 1900

Date post: 11-Feb-2016
Category:
Upload: markku
View: 41 times
Download: 0 times
Share this document with a friend
Description:
Intro to Statistics for the Behavioral Sciences PSYC 1900. Lecture 4: The Normal Distribution and Z-Scores. Quick Review of Box-and-Whisker Plots. First find the median location and mdn Find the quartile locations Medians of the upper and lower half of distribution - PowerPoint PPT Presentation
31
Intro to Statistics for Intro to Statistics for the Behavioral Sciences the Behavioral Sciences PSYC 1900 PSYC 1900 Lecture 4: The Normal Lecture 4: The Normal Distribution and Z-Scores Distribution and Z-Scores
Transcript
Page 1: Intro to Statistics for the Behavioral Sciences PSYC 1900

Intro to Statistics for the Intro to Statistics for the Behavioral SciencesBehavioral Sciences

PSYC 1900PSYC 1900

Lecture 4: The Normal Lecture 4: The Normal Distribution and Z-ScoresDistribution and Z-Scores

Page 2: Intro to Statistics for the Behavioral Sciences PSYC 1900

Quick Review of Box-and-Quick Review of Box-and-Whisker PlotsWhisker Plots

First find the median location and mdnFirst find the median location and mdn Find the quartile locationsFind the quartile locations

Medians of the upper and lower half of Medians of the upper and lower half of distributiondistribution

Quartile location = (mdn location + 1) / 2Quartile location = (mdn location + 1) / 2 These are termed the “hinges”These are termed the “hinges” Note: drop fractional values of mdn locationNote: drop fractional values of mdn location Hinges bracket interquartile range (IQR)Hinges bracket interquartile range (IQR) Hinges serve as top and bottom of boxHinges serve as top and bottom of box

Page 3: Intro to Statistics for the Behavioral Sciences PSYC 1900

Box-and-Whisker PlotsBox-and-Whisker Plots Find the H-spreadFind the H-spread

Range between two quartilesRange between two quartiles Simply the IQRSimply the IQR Area inside box in plotArea inside box in plot

Draw the whiskersDraw the whiskers Lines from hinges to farthest points not Lines from hinges to farthest points not

more than 1.5 X H-spreadmore than 1.5 X H-spread OutliersOutliers

Points beyond whiskersPoints beyond whiskers Denoted with asterisksDenoted with asterisks

Page 4: Intro to Statistics for the Behavioral Sciences PSYC 1900

Stem-and-Leaf Plot

Frequency Stem & Leaf

1.00 0 . 1 3.00 0 . 233 4.00 0 . 4445 3.00 0 . 667 1.00 Extremes (>=12)

Stem width: 10.00 Each leaf: 1 case(s)

Page 5: Intro to Statistics for the Behavioral Sciences PSYC 1900

Outlier DetectionOutlier Detection One rule of thumb is to classify points as One rule of thumb is to classify points as

outliers if they are beyond 3 sd’s from the outliers if they are beyond 3 sd’s from the mean.mean. As we’ll see later in this lecture, that implies that As we’ll see later in this lecture, that implies that

they are very rare occurrencesthey are very rare occurrences One problemOne problem

Presence of outlier inflates standard deviationPresence of outlier inflates standard deviation Box-and-Whisker Plot outlier detection is not Box-and-Whisker Plot outlier detection is not

influenced by this issue.influenced by this issue. H-spread “trims” off influence of extreme pointsH-spread “trims” off influence of extreme points

Page 6: Intro to Statistics for the Behavioral Sciences PSYC 1900

Descriptives With and Without Descriptives With and Without “Outlier”“Outlier”

4.752.86

3 7.5812.33

X

X

X

OutBound

* 4.751.81

3 5.4310.18

X

X

X

OutBound

If point is allowed to inflate variance, it will not be considered an outlier.If it is not, it will.

Page 7: Intro to Statistics for the Behavioral Sciences PSYC 1900

Boxplots to Compare Boxplots to Compare GroupsGroups

Useful in providing a quick visual check on Useful in providing a quick visual check on group distributions in an experiment.group distributions in an experiment. Mean =3 in all groupsMean =3 in all groups

Page 8: Intro to Statistics for the Behavioral Sciences PSYC 1900

The Normal DistributionThe Normal Distribution A specific A specific

distribution distribution characterized by a characterized by a bell-shaped formbell-shaped form Much used to Much used to

calculate calculate probabilities of probabilities of scores on variablesscores on variables

X

3.503.00

2.502.00

1.501.00

.500.00-.50

-1.00-1.50

-2.00-2.50

-3.00-3.50

-4.00

1200

1000

800

600

400

200

0

Std. Dev = 1.00

Mean = -.01

N = 10000.00

Page 9: Intro to Statistics for the Behavioral Sciences PSYC 1900

What’s So Useful About What’s So Useful About Distributions?Distributions?

Distributions specify the way scores Distributions specify the way scores deviate around a measure of central deviate around a measure of central tendency.tendency. In so doing, they allow us to calculate In so doing, they allow us to calculate

the probabilities of specific values the probabilities of specific values occurring.occurring.

Page 10: Intro to Statistics for the Behavioral Sciences PSYC 1900

Pie ChartPie Chart An example for a nominal scaleAn example for a nominal scale Areas “under the curve” provide information Areas “under the curve” provide information

on probabilitieson probabilities

Most criminals are on probation

70% (.7 prob) that a criminal would be on probation or in jail

Page 11: Intro to Statistics for the Behavioral Sciences PSYC 1900

More on Distributions & More on Distributions & ProbProb

Same “adding” of areas under curve Same “adding” of areas under curve holds for histogramsholds for histograms

If 64 of 289 cases occur within an If 64 of 289 cases occur within an interval of interest:interval of interest: 22% of cases have this “score”22% of cases have this “score” Probability of any selected case having Probability of any selected case having

this score is .22this score is .22 Integrating area under curve provides a Integrating area under curve provides a

probability estimateprobability estimate

Page 12: Intro to Statistics for the Behavioral Sciences PSYC 1900

Normal DistributionNormal Distribution For continuous For continuous

variables, we simply variables, we simply connect “tops” of connect “tops” of bars to form a curve.bars to form a curve. Abscissa: Horizontal Abscissa: Horizontal

AxisAxis Ordinate: Vertical Ordinate: Vertical

AxisAxis Density: Height of Density: Height of

curve at a value of Xcurve at a value of X X

3.503.00

2.502.00

1.501.00

.500.00-.50

-1.00-1.50

-2.00-2.50

-3.00-3.50

-4.00

1200

1000

800

600

400

200

0

Std. Dev = 1.00

Mean = -.01

N = 10000.00

Page 13: Intro to Statistics for the Behavioral Sciences PSYC 1900

Normal DistributionNormal Distribution Mathematically defines as:Mathematically defines as:

Pi and e are constants (3.14, 2.72)Pi and e are constants (3.14, 2.72) When the mean and sd are calculated, When the mean and sd are calculated,

the distribution can be drawn and the distribution can be drawn and densities at any given points determined.densities at any given points determined.

2 2/ 21( )2

X XX

X

f X e

Page 14: Intro to Statistics for the Behavioral Sciences PSYC 1900

Normal DistributionNormal Distribution It would be difficult to calculate It would be difficult to calculate

probabilities/densities for each new probabilities/densities for each new sample.sample.

Therefore, we use the standard Therefore, we use the standard normal distribution and transform normal distribution and transform scores on variables to fit it.scores on variables to fit it. A normal distribution with a mean of A normal distribution with a mean of

zero and a sd=1 [N(0,1)].zero and a sd=1 [N(0,1)].

Page 15: Intro to Statistics for the Behavioral Sciences PSYC 1900

Distribution FormsDistribution Forms Many processes can be Many processes can be

described by a normal described by a normal distribution, but not all.distribution, but not all. Number of meteor Number of meteor

strikes, number of strikes, number of supreme court supreme court retirements?retirements?

Here use Poisson, which Here use Poisson, which is governed by the is governed by the expected number of expected number of occurrences for an occurrences for an interval.interval.

Page 16: Intro to Statistics for the Behavioral Sciences PSYC 1900

Score TransformationsScore Transformations In order to use the standard normal In order to use the standard normal

tables to determine probabilities, we tables to determine probabilities, we transform scores.transform scores. Linear transformations of means do not Linear transformations of means do not

change the shape of the distributionchange the shape of the distribution If we have a dist with a mean of 50, we If we have a dist with a mean of 50, we

need to transform scores so that 50=0need to transform scores so that 50=0 Take deviations: (X-50) for new point valuesTake deviations: (X-50) for new point values Solves problem of getting mean to zero, but Solves problem of getting mean to zero, but

what about standard deviation?what about standard deviation?

Page 17: Intro to Statistics for the Behavioral Sciences PSYC 1900

Score TransformationsScore Transformations The Standard Normal has a sd = 1The Standard Normal has a sd = 1 If we divide all values of a variable by If we divide all values of a variable by

a constant, we divide the standard a constant, we divide the standard deviation by that constantdeviation by that constant To get a sd=1, we simply divide the To get a sd=1, we simply divide the

mean transformed (i.e., deviation mean transformed (i.e., deviation scores) by the sd of the distribution.scores) by the sd of the distribution.

If the sd=5, dividing all scores by 5 If the sd=5, dividing all scores by 5 produces an sd=1produces an sd=1

Page 18: Intro to Statistics for the Behavioral Sciences PSYC 1900

Z-scores and the Standard Z-scores and the Standard Normal DistributionNormal Distribution

This transformation of raw scores produces z This transformation of raw scores produces z scoresscores Z scores are interpreted as the number of Z scores are interpreted as the number of

standard deviation units above or below the standard deviation units above or below the meanmean

Raw score of 7 in a distribution with mean = 10 Raw score of 7 in a distribution with mean = 10 and sd=2 produces:and sd=2 produces:

X

X

Xz

7 10 1.52

z

Page 19: Intro to Statistics for the Behavioral Sciences PSYC 1900

Z Score TransformationZ Score Transformation A linear transformationA linear transformation

addition, subtraction, multiplication, addition, subtraction, multiplication, and/or division by constantsand/or division by constants

Does not change form of the Does not change form of the distributiondistribution

Z-scoring or “standardizing” a Z-scoring or “standardizing” a distribution does not make the distribution does not make the distribution a normal onedistribution a normal one Shape will be the same, but mean = 0 Shape will be the same, but mean = 0

and sd = 1and sd = 1

Page 20: Intro to Statistics for the Behavioral Sciences PSYC 1900

Z Score BenefitsZ Score Benefits Allows us to compare scores collected Allows us to compare scores collected

on different metricson different metrics Each score can be interpreted based on Each score can be interpreted based on

its deviation from the mean with respect its deviation from the mean with respect to the magnitude of average deviationsto the magnitude of average deviations

Allows us to easily obtain probabilities Allows us to easily obtain probabilities for specific scores based on a “known” for specific scores based on a “known” normal distribution density functionnormal distribution density function

Page 21: Intro to Statistics for the Behavioral Sciences PSYC 1900

Z Score to ProbabilitiesZ Score to Probabilities If we know a z score, we can calculate If we know a z score, we can calculate

probabilities attached to it.probabilities attached to it. Area under the curve is 1.00Area under the curve is 1.00 Tabled values of standard normal Tabled values of standard normal

distribution reflect area from the mean to distribution reflect area from the mean to that valuethat value

Note that if distribution shape differs Note that if distribution shape differs substantially from normal, probability substantially from normal, probability estimates will be incorrectestimates will be incorrect

Page 22: Intro to Statistics for the Behavioral Sciences PSYC 1900
Page 23: Intro to Statistics for the Behavioral Sciences PSYC 1900

z

3.503.00

2.502.00

1.501.00

.500.00-.50

-1.00-1.50

-2.00-2.50

-3.00-3.50

-4.00

Normal Distribution

Cutoff at +1.6451200

1000

800

600

400

200

0

Page 24: Intro to Statistics for the Behavioral Sciences PSYC 1900

Z Score to ProbabilitiesZ Score to Probabilities A z=1.00 in the table corresponds to an A z=1.00 in the table corresponds to an

area of 0.34area of 0.34 A score between z=0 and z=1 has a A score between z=0 and z=1 has a

probability of occurring of 0.34probability of occurring of 0.34 The probability of a score at or below z=1 is:The probability of a score at or below z=1 is:

.50+.34=.84.50+.34=.84 The probability of a score higher than z=1 The probability of a score higher than z=1

is:is: .50-.34=.16; or 1.00-.84=.16.50-.34=.16; or 1.00-.84=.16

The probability of a score -1<z<1?The probability of a score -1<z<1? .34+.34=.68.34+.34=.68 Distribution is symmetricDistribution is symmetric

Page 25: Intro to Statistics for the Behavioral Sciences PSYC 1900

Curve Area AppletCurve Area Applet

Page 26: Intro to Statistics for the Behavioral Sciences PSYC 1900

Setting Probable Limits for Setting Probable Limits for ObservationsObservations

Many times, it is useful to predict an Many times, it is useful to predict an interval in which a randomly sampled interval in which a randomly sampled data point will fall.data point will fall. A randomly sampled individual’s score A randomly sampled individual’s score

should fall between X and X’ with 95% should fall between X and X’ with 95% certainty.certainty.

This implies we’re looking for the area This implies we’re looking for the area under the curve that covers 95% (cut off under the curve that covers 95% (cut off 2.5% in each tail)2.5% in each tail)

Page 27: Intro to Statistics for the Behavioral Sciences PSYC 1900

Setting Probable Limits for Setting Probable Limits for ObservationsObservations

From the table, we can see that a From the table, we can see that a z=1.96 leaves 2.5% remaining in tail.z=1.96 leaves 2.5% remaining in tail.

Page 28: Intro to Statistics for the Behavioral Sciences PSYC 1900
Page 29: Intro to Statistics for the Behavioral Sciences PSYC 1900

Setting Probable Limits for Setting Probable Limits for ObservationsObservations

From the table, we can From the table, we can see that a z=1.96 leaves see that a z=1.96 leaves 2.5% remaining in tail.2.5% remaining in tail.

We simply need to We simply need to calculate what raw score calculate what raw score corresponds to a corresponds to a z=1.96.z=1.96. Note that here we must Note that here we must

know population mean know population mean and sd.and sd.

1.96

1.961.96

X

X

X

X

X

X X

Xz

X

XX

Page 30: Intro to Statistics for the Behavioral Sciences PSYC 1900

Setting Probable Limits for Setting Probable Limits for ObservationsObservations

If mean is 50 and sd=10If mean is 50 and sd=10

50 1.96(10)lim 30.4;69.6Xits

Page 31: Intro to Statistics for the Behavioral Sciences PSYC 1900

Converting Z’s to Other Converting Z’s to Other Standard ScoresStandard Scores

Standard scores are ones with Standard scores are ones with predetermined means and sd’spredetermined means and sd’s

New score = New SD (z) + New MeanNew score = New SD (z) + New Mean For IQ [N(100,15):For IQ [N(100,15):

IQ score for z of 1 = 15 (1) + 100 = 115IQ score for z of 1 = 15 (1) + 100 = 115


Recommended