Basic Quantitative Basic Quantitative Methods in the Social Methods in the Social
SciencesSciences
(AKA Intro Stats)(AKA Intro Stats)02-250-0102-250-01
Lecture 9Lecture 9
Assignment Due and Course Assignment Due and Course EvaluationsEvaluations
• All four modules of the assignment All four modules of the assignment are due in the first 5 minutes of are due in the first 5 minutes of class. class. NONO assignment will be assignment will be accepted after 4:05 PM. accepted after 4:05 PM.
• Course evaluations will be Course evaluations will be completed during the first 10 completed during the first 10 minutes of class.minutes of class.
CorrelationCorrelation• We are often interested in knowing about We are often interested in knowing about
the relationship between two variables.the relationship between two variables.
• Consider the following research questions:Consider the following research questions: Does the incidence of crime (X) vary with the Does the incidence of crime (X) vary with the
outdoor temperature (Y) in Detroit?outdoor temperature (Y) in Detroit? Does pizza consumption (X) have anything to Does pizza consumption (X) have anything to
do with how much time one spends surfing the do with how much time one spends surfing the web (Y)?web (Y)?
Does severity of depression (X) vary as a Does severity of depression (X) vary as a function of Ecstacy use (Y)?function of Ecstacy use (Y)?
Do the occurrence of pimples (X) increase as Do the occurrence of pimples (X) increase as air pollution increases (Y) in Windsor?air pollution increases (Y) in Windsor?
CorrelationCorrelation• These are all examples of relationships. These are all examples of relationships.
• In each case, we are asking whether one In each case, we are asking whether one variable (X) is related to another variable (Y). variable (X) is related to another variable (Y). Stated differently: Are X and Y correlated?Stated differently: Are X and Y correlated?
• More specifically: Are changes in one More specifically: Are changes in one variable reliably accompanied by changes in variable reliably accompanied by changes in the other?the other?
• ““Correlation coefficients” can be calculated Correlation coefficients” can be calculated so that we can measure the degree to which so that we can measure the degree to which two variables are related to each other.two variables are related to each other.
Scatter Plot Used to Describe CorrelationScatter Plot Used to Describe Correlation
• We can plot the X We can plot the X and Y points on a and Y points on a Scatter plot.Scatter plot.
We plot the Y We plot the Y scores on the scores on the vertical axis and vertical axis and the X scores on the X scores on the horizontal axis.the horizontal axis.
We then can draw We then can draw a straight line to a straight line to try to represent or try to represent or describe the points describe the points on our scatter plot.on our scatter plot.
Graphing RelationshipsGraphing Relationships
Height/Weight Scatterplot
0
50
100
150
0 2 4 6 8
Height
Heig
ht
• When our height and When our height and weight scores are weight scores are plotted, we see some plotted, we see some irregularity.irregularity.
• We can draw a straight We can draw a straight line through these points line through these points to summarize the to summarize the relationship.relationship.
• The line provides an The line provides an average statement about average statement about change in one variable change in one variable associated with changes associated with changes in the other variable.in the other variable.r = .770
Correlation
AGE
WEIGHT
Imagine if….Imagine if….All of the dots fell exactly on the All of the dots fell exactly on the
line? What would that mean?line? What would that mean?
All of the dots clustered close to the All of the dots clustered close to the line, but few fell on the line – What line, but few fell on the line – What would that mean?would that mean?
The dots were widely dispersed The dots were widely dispersed around the line, such that the line is around the line, such that the line is only a vague representation of how only a vague representation of how the scatterplot looks. What would the scatterplot looks. What would that mean?that mean?
Correlation: Positive RCorrelation: Positive R
• Lets look at some Lets look at some different scatter plots.different scatter plots.
• A positive A positive relationship.relationship.
Various degrees of linear Various degrees of linear correlation correlation
Correlation: Negative RCorrelation: Negative R
•Lets look at some different Lets look at some different scatter plots.scatter plots.
• A negative A negative relationship.relationship.
Various degrees of linear Various degrees of linear correlationcorrelation
Correlation: No RelationshipCorrelation: No Relationship
•Lets look at some different scatter Lets look at some different scatter plots.plots.
• No Relationship:No Relationship:
What Direction Relationship Is Described What Direction Relationship Is Described in This Scatter Plot?in This Scatter Plot?
TESTB
40383634323028
TE
ST
A
34
32
30
28
26
24
22
20
18
Logic Dictates…Logic Dictates…• We can measure the distance between each We can measure the distance between each
dot and the line. dot and the line.
• If a perfect correlation (1.000) is If a perfect correlation (1.000) is represented by all of the dots falling on the represented by all of the dots falling on the line, while a line whose dots vary around it line, while a line whose dots vary around it indicates a weaker correlation…indicates a weaker correlation…
• The degree to which the two variables are The degree to which the two variables are correlated can be thought of as the mean correlated can be thought of as the mean distance between the dots and the line. This distance between the dots and the line. This is calculated algebraically.is calculated algebraically.
CovarianceCovariance• Conceptually, the correlation between X and Y is Conceptually, the correlation between X and Y is
based on based on covariancecovariance – a statistic representing – a statistic representing the degree to which two variables vary together.the degree to which two variables vary together.
• Like variance, covariance is based on deviations Like variance, covariance is based on deviations from the mean.from the mean.
• r is calculated as r is calculated as
• But wait! Just like calculating variance, there is an But wait! Just like calculating variance, there is an easier formulaeasier formula
1
cov
N
YYXXXY
yx
XY
ssr
cov
The Pearson Product-Moment Correlation The Pearson Product-Moment Correlation Coefficient (r)Coefficient (r)
• r is a quantitative expression of the r is a quantitative expression of the degree to which two variables are degree to which two variables are correlated in a correlated in a linearlinear relationship. relationship.
• Linear relationship: This means that the Linear relationship: This means that the scatterplot points are clustered more or scatterplot points are clustered more or less symmetrically about a straight line, less symmetrically about a straight line, such that the line is an adequate such that the line is an adequate representation of the relationship.representation of the relationship.
• Non-linear or curvillinear relationship: Non-linear or curvillinear relationship: The scatterplot points do not cluster The scatterplot points do not cluster around a straight line. Example? around a straight line. Example? Arousal/performanceArousal/performance
Characteristics of rCharacteristics of r
• r has two components: r has two components:
The degree of relationshipThe degree of relationship
The direction of relationshipThe direction of relationship
• r ranges from –1.000 to +1.000r ranges from –1.000 to +1.000
Are Are XX & & YY CorrelatedCorrelated??
SUBJECTS X YSubject 1 1 2Subject 2 2 3Subject 3 3 4Subject 4 4 3Subject 5 5 5
[ ]
[ ]
The Pearson rThe Pearson r
r =
Note: This formula really is the same as the one in the book, just slightly rearranged.
• Sum of the Xs Sum of the Xs
• Sum of the Ys Sum of the Ys
• Sum of the Xs squared Sum of the Xs squared ))
• Sum of the Ys squaredSum of the Ys squared
• Sum of the squared Xs Sum of the squared Xs
• Sum of the squared Ys Sum of the squared Ys
• Sum of Xs times the Ys Sum of Xs times the Ys
• Number of Subjects Number of Subjects
We Need:We Need:
Correlation Correlation ArithmeticArithmetic
SUBJECTS X Y X^2 Y^2 XYSubject 1 1 2 1 4 2Subject 2 2 3 4 9 6Subject 3 3 4 9 16 12Subject 4 4 3 16 9 12Subject 5 5 5 25 25 25Totals 15 17 55 63 57
^ ^
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ][ ]
The Pearson rThe Pearson r
r =
[ ][ ]
The Pearson rThe Pearson r
r =
The Pearson rThe Pearson r
r =
The Pearson rThe Pearson r
r =
The Pearson rThe Pearson r
r =
Hypothesis Testing with Hypothesis Testing with CorrelationsCorrelations
• HH00 = = = 0 ( = 0 ( = “rho” – population correlation = “rho” – population correlation coefficient)coefficient)
• HHaa = = 0 (there is a significant relationship between 0 (there is a significant relationship between X and Y)X and Y)
• Technically, you could do a one-tailed test for Technically, you could do a one-tailed test for correlations (correlations ( <0 or <0 or >0), but for our purposes we >0), but for our purposes we will always test whether there simply is a will always test whether there simply is a relationship – therefore, we will always do a two-relationship – therefore, we will always do a two-tailed test for correlations. tailed test for correlations.
• Find the critical value for .05 with df=n-2 (where N Find the critical value for .05 with df=n-2 (where N is the number of paired observations) in Table E.2 p. is the number of paired observations) in Table E.2 p. 440440
The Pearson rThe Pearson r
r =
Is an r of .832 significant?See Table E.2 (p.440) for
n - 2 df ( 5 - 2 = 3 df)and an alpha () of .05
The Pearson rThe Pearson r
r =
Is an r of .832 significant?The “Critical r” = .878
r = .832Therefore, the correlation is NOT significant
Popcorn ConsumptionPopcorn Consumption
• Researcher X hypothesizes that popcorn Researcher X hypothesizes that popcorn consumption varies as a function of stress. consumption varies as a function of stress. He gives a random sample of 5 people a He gives a random sample of 5 people a self-report measure of stress that produces self-report measure of stress that produces scores ranging from 1 (little or no stress) to scores ranging from 1 (little or no stress) to 10 (very stressed), and then has them watch 10 (very stressed), and then has them watch a movie. He measures how many kernels of a movie. He measures how many kernels of popcorn each of them eat. Is popcorn popcorn each of them eat. Is popcorn consumption correlated with stress?consumption correlated with stress?
Are Are XX & & YY CorrelatedCorrelated??
Subject 1 7 10Subject 2 5 9Subject 3 9 12Subject 4 5 3Subject 5 3 6Totals 29 40
Stress Ratings # of Kernals
[ ]
[ ]
The Pearson rThe Pearson r
r =
• Sum of the Xs Sum of the Xs
• Sum of the Ys Sum of the Ys
• Sum of the Xs squared Sum of the Xs squared ))
• Sum of the Ys squaredSum of the Ys squared
• Sum of the squared Xs Sum of the squared Xs
• Sum of the squared Ys Sum of the squared Ys
• Sum of Xs times the Ys Sum of Xs times the Ys
• Number of Subjects Number of Subjects
We Need:We Need:
Correlation Correlation ArithmeticArithmetic
SUBJECTS X Y X^2 Y^2 XYSubject 1 7 10 49 100 70Subject 2 5 9 25 81 45Subject 3 9 12 81 144 108Subject 4 5 3 25 9 15Subject 5 3 6 9 36 18Totals 29 40 189 370 256
^ ^
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ]
[ ]
The Pearson rThe Pearson r
r =
[ ][ ]
The Pearson rThe Pearson r
r =
[ ][ ]
The Pearson rThe Pearson r
r =
The Pearson rThe Pearson r
r =
The Pearson rThe Pearson r
r =
The Pearson rThe Pearson r
r =
The Pearson rThe Pearson r
r =
Is an r of .744 significant?See Table E.2 (p.440) for
n - 2 df ( 5 - 2 = 3 df)and an alpha () of .05
The Pearson rThe Pearson r
r =
Is an r of .744 significant?The “Critical r” = .878
r = .744Therefore, the correlation is NOT significant
A Useful Means of Interpretation: VarianceA Useful Means of Interpretation: Variance
• r is not the most useful interpretation of r is not the most useful interpretation of a correlation. a correlation.
• rr22 is more useful. r is more useful. r2 2 is the proportion of is the proportion of the variance of the Y scores that is the variance of the Y scores that is accounted for by X. accounted for by X.
• You need so much information in order You need so much information in order to make an error free prediction of Y. rto make an error free prediction of Y. r22 is roughly equal to the percentage of is roughly equal to the percentage of that information that you possess just by that information that you possess just by knowing X.knowing X.
Why Do Some People Have High-Self-Esteem Why Do Some People Have High-Self-Esteem While Others Have Low Self-Esteem?While Others Have Low Self-Esteem?
• Say 100 people are given a self-esteem Say 100 people are given a self-esteem inventory (e.g., “I think I am a person of worth”, inventory (e.g., “I think I am a person of worth”, from 1=strongly disagree to 5 = strongly from 1=strongly disagree to 5 = strongly agree)agree)
• They are also asked to fill out measures of They are also asked to fill out measures of body-satisfaction (“I think I have a good body”), body-satisfaction (“I think I have a good body”), social-esteem (“I think I am a good friend”), and social-esteem (“I think I am a good friend”), and academic-esteem (“I am a good student”).academic-esteem (“I am a good student”).
• Correlations are calculated between overall Correlations are calculated between overall self-esteem and the other variables (3 self-esteem and the other variables (3 correlations).correlations).
Explaining Self-EsteemExplaining Self-Esteem
Body-esteem(r=.41, 16% ofthe variance)Social-esteem(r=.54)
Unidentifiedvariables
Academic-esteem(r=.40)
•The entire pie = Overall self-esteem
•The different pieces represent different variables that explain the variability (or variance) in self-esteem scores (in other words, these variables explain why some people have high self-esteem, low self-esteem, very low, etc. etc.)
So…So…
• Body-esteem accounts for (or Body-esteem accounts for (or explains) 16% of the variance in explains) 16% of the variance in overall self-esteem.overall self-esteem.
• Social-esteem explains…? (.540)Social-esteem explains…? (.540)(.540) = .290, so it explains 29% of (.540) = .290, so it explains 29% of the variance in overall self-esteem.the variance in overall self-esteem.
Correlation: Errors in Interpreting Correlation: Errors in Interpreting rr
• Common errors in interpreting a correlation Common errors in interpreting a correlation coefficientcoefficient
Interpreting r in direct proportion to its sizeInterpreting r in direct proportion to its size• Not a percentageNot a percentage
• Not proportionate across the range (.2 not half Not proportionate across the range (.2 not half of .4)of .4)
The correlation coefficient is an ordinal statistic. The correlation coefficient is an ordinal statistic. So r=0.750 represents a stronger relationship So r=0.750 represents a stronger relationship than r=0.520than r=0.520
Interpreting in terms of arbitrary descriptive Interpreting in terms of arbitrary descriptive labelslabels• Small - medium – largeSmall - medium – large
More Errors Interpreting More Errors Interpreting CorrelationCorrelation
• Correlation does NOT imply Causation!Correlation does NOT imply Causation!
•X causes Y to change X causes Y to change Examples? Examples?
•Y causes X to change Y causes X to change Examples? Examples?
•W causes changes in X and Y! W causes changes in X and Y! Examples? Examples?
•SO: body-esteem might account for 16% of SO: body-esteem might account for 16% of the variance in self-esteem, but this does not the variance in self-esteem, but this does not mean that body-esteem causes self-esteem.mean that body-esteem causes self-esteem.
•For the trip to Hawaii and the Samsonite For the trip to Hawaii and the Samsonite Luggage… Psychologists used to think that Luggage… Psychologists used to think that having been sexually abused causes bulimia. having been sexually abused causes bulimia. How could researchers demonstrate that this How could researchers demonstrate that this is true?is true?
Factors that affect the size of a Factors that affect the size of a correlationcorrelation
Nature of the relationship between X and Y.Nature of the relationship between X and Y.
Heterogeneous subsamples – if the sample Heterogeneous subsamples – if the sample could be subdivided into 2 distinct sets could be subdivided into 2 distinct sets based on another variable (e.g, males vs. based on another variable (e.g, males vs. females)females)
Truncated range.Truncated range.
•Range restricted in size.Range restricted in size.
•May cause correlation to appear lower May cause correlation to appear lower than it really is (or higher than it is for than it really is (or higher than it is for non-linear relationships)non-linear relationships)Without the full range of scores it is not Without the full range of scores it is not
possible to calculate the correlation possible to calculate the correlation accurately. Let’s look at why…accurately. Let’s look at why…
Underlying Assumptions for rUnderlying Assumptions for r• X and Y need to be adequately represented X and Y need to be adequately represented
by a straight line function. Stated by a straight line function. Stated differently, the relationship must be linear.differently, the relationship must be linear.
• If r is to be used inferentially…If r is to be used inferentially… Homoscedasticity – The variabilities of X at Homoscedasticity – The variabilities of X at
different values of Y are equal. E.g., variability different values of Y are equal. E.g., variability in weight for 6’5” people is equal to variability in weight for 6’5” people is equal to variability in weight for 5’5”.in weight for 5’5”.
Normality – X is normally distributed at all Normality – X is normally distributed at all values of Y (e.g., weight is normally distributed values of Y (e.g., weight is normally distributed for 6’5” people and for 5’5” people.for 6’5” people and for 5’5” people.
Vice-versa as well (Y at values of X)Vice-versa as well (Y at values of X)
Work on itWork on it
• Say we’re Say we’re interested in interested in knowing whether knowing whether exam grades are exam grades are related to number related to number of hour spent of hour spent studying. Ten studying. Ten students report students report how many hours how many hours they studied for an they studied for an exam. Here are exam. Here are the data:the data:
StudentStudent Hours Hours studyinstudyingg
Exam Exam gradegrade
11 44 5656
22 33 4545
33 88 8080
44 55 7575
55 66 9696
66 33 4444
77 66 7474
88 66 5353
99 99 8585
1010 55 6060
Work on it!Work on it!
• State the Ho and Ha.State the Ho and Ha.
• Test the hypothesis.Test the hypothesis.