Intro to Statistics for the Intro to Statistics for the Behavioral SciencesBehavioral Sciences
PSYC 1900PSYC 1900
Lecture 4: The Normal Lecture 4: The Normal Distribution and Z-ScoresDistribution and Z-Scores
Quick Review of Box-and-Quick Review of Box-and-Whisker PlotsWhisker Plots
First find the median location and mdnFirst find the median location and mdn Find the quartile locationsFind the quartile locations
Medians of the upper and lower half of Medians of the upper and lower half of distributiondistribution
Quartile location = (mdn location + 1) / 2Quartile location = (mdn location + 1) / 2 These are termed the “hinges”These are termed the “hinges” Note: drop fractional values of mdn locationNote: drop fractional values of mdn location Hinges bracket interquartile range (IQR)Hinges bracket interquartile range (IQR) Hinges serve as top and bottom of boxHinges serve as top and bottom of box
Box-and-Whisker PlotsBox-and-Whisker Plots Find the H-spreadFind the H-spread
Range between two quartilesRange between two quartiles Simply the IQRSimply the IQR Area inside box in plotArea inside box in plot
Draw the whiskersDraw the whiskers Lines from hinges to farthest points not Lines from hinges to farthest points not
more than 1.5 X H-spreadmore than 1.5 X H-spread OutliersOutliers
Points beyond whiskersPoints beyond whiskers Denoted with asterisksDenoted with asterisks
Stem-and-Leaf Plot
Frequency Stem & Leaf
1.00 0 . 1 3.00 0 . 233 4.00 0 . 4445 3.00 0 . 667 1.00 Extremes (>=12)
Stem width: 10.00 Each leaf: 1 case(s)
Outlier DetectionOutlier Detection One rule of thumb is to classify points as One rule of thumb is to classify points as
outliers if they are beyond 3 sd’s from the outliers if they are beyond 3 sd’s from the mean.mean. As we’ll see later in this lecture, that implies that As we’ll see later in this lecture, that implies that
they are very rare occurrencesthey are very rare occurrences One problemOne problem
Presence of outlier inflates standard deviationPresence of outlier inflates standard deviation Box-and-Whisker Plot outlier detection is not Box-and-Whisker Plot outlier detection is not
influenced by this issue.influenced by this issue. H-spread “trims” off influence of extreme pointsH-spread “trims” off influence of extreme points
Descriptives With and Without Descriptives With and Without “Outlier”“Outlier”
4.752.86
3 7.5812.33
X
X
X
OutBound
* 4.751.81
3 5.4310.18
X
X
X
OutBound
If point is allowed to inflate variance, it will not be considered an outlier.If it is not, it will.
Boxplots to Compare Boxplots to Compare GroupsGroups
Useful in providing a quick visual check on Useful in providing a quick visual check on group distributions in an experiment.group distributions in an experiment. Mean =3 in all groupsMean =3 in all groups
The Normal DistributionThe Normal Distribution A specific A specific
distribution distribution characterized by a characterized by a bell-shaped formbell-shaped form Much used to Much used to
calculate calculate probabilities of probabilities of scores on variablesscores on variables
X
3.503.00
2.502.00
1.501.00
.500.00-.50
-1.00-1.50
-2.00-2.50
-3.00-3.50
-4.00
1200
1000
800
600
400
200
0
Std. Dev = 1.00
Mean = -.01
N = 10000.00
What’s So Useful About What’s So Useful About Distributions?Distributions?
Distributions specify the way scores Distributions specify the way scores deviate around a measure of central deviate around a measure of central tendency.tendency. In so doing, they allow us to calculate In so doing, they allow us to calculate
the probabilities of specific values the probabilities of specific values occurring.occurring.
Pie ChartPie Chart An example for a nominal scaleAn example for a nominal scale Areas “under the curve” provide information Areas “under the curve” provide information
on probabilitieson probabilities
Most criminals are on probation
70% (.7 prob) that a criminal would be on probation or in jail
More on Distributions & More on Distributions & ProbProb
Same “adding” of areas under curve Same “adding” of areas under curve holds for histogramsholds for histograms
If 64 of 289 cases occur within an If 64 of 289 cases occur within an interval of interest:interval of interest: 22% of cases have this “score”22% of cases have this “score” Probability of any selected case having Probability of any selected case having
this score is .22this score is .22 Integrating area under curve provides a Integrating area under curve provides a
probability estimateprobability estimate
Normal DistributionNormal Distribution For continuous For continuous
variables, we simply variables, we simply connect “tops” of connect “tops” of bars to form a curve.bars to form a curve. Abscissa: Horizontal Abscissa: Horizontal
AxisAxis Ordinate: Vertical Ordinate: Vertical
AxisAxis Density: Height of Density: Height of
curve at a value of Xcurve at a value of X X
3.503.00
2.502.00
1.501.00
.500.00-.50
-1.00-1.50
-2.00-2.50
-3.00-3.50
-4.00
1200
1000
800
600
400
200
0
Std. Dev = 1.00
Mean = -.01
N = 10000.00
Normal DistributionNormal Distribution Mathematically defines as:Mathematically defines as:
Pi and e are constants (3.14, 2.72)Pi and e are constants (3.14, 2.72) When the mean and sd are calculated, When the mean and sd are calculated,
the distribution can be drawn and the distribution can be drawn and densities at any given points determined.densities at any given points determined.
2 2/ 21( )2
X XX
X
f X e
Normal DistributionNormal Distribution It would be difficult to calculate It would be difficult to calculate
probabilities/densities for each new probabilities/densities for each new sample.sample.
Therefore, we use the standard Therefore, we use the standard normal distribution and transform normal distribution and transform scores on variables to fit it.scores on variables to fit it. A normal distribution with a mean of A normal distribution with a mean of
zero and a sd=1 [N(0,1)].zero and a sd=1 [N(0,1)].
Distribution FormsDistribution Forms Many processes can be Many processes can be
described by a normal described by a normal distribution, but not all.distribution, but not all. Number of meteor Number of meteor
strikes, number of strikes, number of supreme court supreme court retirements?retirements?
Here use Poisson, which Here use Poisson, which is governed by the is governed by the expected number of expected number of occurrences for an occurrences for an interval.interval.
Score TransformationsScore Transformations In order to use the standard normal In order to use the standard normal
tables to determine probabilities, we tables to determine probabilities, we transform scores.transform scores. Linear transformations of means do not Linear transformations of means do not
change the shape of the distributionchange the shape of the distribution If we have a dist with a mean of 50, we If we have a dist with a mean of 50, we
need to transform scores so that 50=0need to transform scores so that 50=0 Take deviations: (X-50) for new point valuesTake deviations: (X-50) for new point values Solves problem of getting mean to zero, but Solves problem of getting mean to zero, but
what about standard deviation?what about standard deviation?
Score TransformationsScore Transformations The Standard Normal has a sd = 1The Standard Normal has a sd = 1 If we divide all values of a variable by If we divide all values of a variable by
a constant, we divide the standard a constant, we divide the standard deviation by that constantdeviation by that constant To get a sd=1, we simply divide the To get a sd=1, we simply divide the
mean transformed (i.e., deviation mean transformed (i.e., deviation scores) by the sd of the distribution.scores) by the sd of the distribution.
If the sd=5, dividing all scores by 5 If the sd=5, dividing all scores by 5 produces an sd=1produces an sd=1
Z-scores and the Standard Z-scores and the Standard Normal DistributionNormal Distribution
This transformation of raw scores produces z This transformation of raw scores produces z scoresscores Z scores are interpreted as the number of Z scores are interpreted as the number of
standard deviation units above or below the standard deviation units above or below the meanmean
Raw score of 7 in a distribution with mean = 10 Raw score of 7 in a distribution with mean = 10 and sd=2 produces:and sd=2 produces:
X
X
Xz
7 10 1.52
z
Z Score TransformationZ Score Transformation A linear transformationA linear transformation
addition, subtraction, multiplication, addition, subtraction, multiplication, and/or division by constantsand/or division by constants
Does not change form of the Does not change form of the distributiondistribution
Z-scoring or “standardizing” a Z-scoring or “standardizing” a distribution does not make the distribution does not make the distribution a normal onedistribution a normal one Shape will be the same, but mean = 0 Shape will be the same, but mean = 0
and sd = 1and sd = 1
Z Score BenefitsZ Score Benefits Allows us to compare scores collected Allows us to compare scores collected
on different metricson different metrics Each score can be interpreted based on Each score can be interpreted based on
its deviation from the mean with respect its deviation from the mean with respect to the magnitude of average deviationsto the magnitude of average deviations
Allows us to easily obtain probabilities Allows us to easily obtain probabilities for specific scores based on a “known” for specific scores based on a “known” normal distribution density functionnormal distribution density function
Z Score to ProbabilitiesZ Score to Probabilities If we know a z score, we can calculate If we know a z score, we can calculate
probabilities attached to it.probabilities attached to it. Area under the curve is 1.00Area under the curve is 1.00 Tabled values of standard normal Tabled values of standard normal
distribution reflect area from the mean to distribution reflect area from the mean to that valuethat value
Note that if distribution shape differs Note that if distribution shape differs substantially from normal, probability substantially from normal, probability estimates will be incorrectestimates will be incorrect
z
3.503.00
2.502.00
1.501.00
.500.00-.50
-1.00-1.50
-2.00-2.50
-3.00-3.50
-4.00
Normal Distribution
Cutoff at +1.6451200
1000
800
600
400
200
0
Z Score to ProbabilitiesZ Score to Probabilities A z=1.00 in the table corresponds to an A z=1.00 in the table corresponds to an
area of 0.34area of 0.34 A score between z=0 and z=1 has a A score between z=0 and z=1 has a
probability of occurring of 0.34probability of occurring of 0.34 The probability of a score at or below z=1 is:The probability of a score at or below z=1 is:
.50+.34=.84.50+.34=.84 The probability of a score higher than z=1 The probability of a score higher than z=1
is:is: .50-.34=.16; or 1.00-.84=.16.50-.34=.16; or 1.00-.84=.16
The probability of a score -1<z<1?The probability of a score -1<z<1? .34+.34=.68.34+.34=.68 Distribution is symmetricDistribution is symmetric
Curve Area AppletCurve Area Applet
Setting Probable Limits for Setting Probable Limits for ObservationsObservations
Many times, it is useful to predict an Many times, it is useful to predict an interval in which a randomly sampled interval in which a randomly sampled data point will fall.data point will fall. A randomly sampled individual’s score A randomly sampled individual’s score
should fall between X and X’ with 95% should fall between X and X’ with 95% certainty.certainty.
This implies we’re looking for the area This implies we’re looking for the area under the curve that covers 95% (cut off under the curve that covers 95% (cut off 2.5% in each tail)2.5% in each tail)
Setting Probable Limits for Setting Probable Limits for ObservationsObservations
From the table, we can see that a From the table, we can see that a z=1.96 leaves 2.5% remaining in tail.z=1.96 leaves 2.5% remaining in tail.
Setting Probable Limits for Setting Probable Limits for ObservationsObservations
From the table, we can From the table, we can see that a z=1.96 leaves see that a z=1.96 leaves 2.5% remaining in tail.2.5% remaining in tail.
We simply need to We simply need to calculate what raw score calculate what raw score corresponds to a corresponds to a z=1.96.z=1.96. Note that here we must Note that here we must
know population mean know population mean and sd.and sd.
1.96
1.961.96
X
X
X
X
X
X X
Xz
X
XX
Setting Probable Limits for Setting Probable Limits for ObservationsObservations
If mean is 50 and sd=10If mean is 50 and sd=10
50 1.96(10)lim 30.4;69.6Xits
Converting Z’s to Other Converting Z’s to Other Standard ScoresStandard Scores
Standard scores are ones with Standard scores are ones with predetermined means and sd’spredetermined means and sd’s
New score = New SD (z) + New MeanNew score = New SD (z) + New Mean For IQ [N(100,15):For IQ [N(100,15):
IQ score for z of 1 = 15 (1) + 100 = 115IQ score for z of 1 = 15 (1) + 100 = 115