Chapter 15 1
Chapter 15
Describing Relationships: Regression, Prediction, and
Causation
Chapter 15 3
Thought Question 2
From past natural disasters, a strong positive correlation has been found between the amount of aid sent and the number of deaths. Would you interpret this to mean that sending more aid causes more people to die? Explain.
Chapter 15 4
Thought Question 3
Studies have shown a negative correlation between the amount of food consumed that is rich in beta carotene and the incidence of lung cancer in adults. Does this correlation provide evidence that beta carotene is a contributing factor in the prevention of lung cancer? Explain.
Chapter 15 6
Linear Regression Objective: To quantify the linear relationship between
an explanatory variable and a response variable.
We can then predict the average response for all subjects with a given value of the explanatory variable.
Regression equation: y = a + bx– x is the value of the explanatory variable– y is the average value of the response variable
– note that a and b are just the intercept and slope of a straight line– note that r and b are not the same thing, but their signs will agree
Plot
Chapter 15 7
Least Squares Regression
Used to determine the “best” line
We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict)
Least Squares: use the line that minimizes the sum of the squares of the vertical distances of the data points from the line
Click for Graphical
Explanation
Chapter 15 8
Prediction via Regression Line
Hand, et.al., A Handbook of Small Data Sets, London: Chapman and Hall
The regression equation is y = 3.6 + 0.97x– y is the average age of all husbands who have wives
of age x
For all women aged 30, we predict the average husband age to be 32.7 years:
3.6 + (0.97)(30) = 32.7 years Suppose we know that an individual wife’s
age is 30. What would we predict her husband’s age to be? How old is her husband?
Husband and Wife: Ages
Chapter 15 9
Coefficient of Determination (R2)
Measures usefulness of regression prediction R2 (or r2, the square of the correlation):
measures the percentage of the variation in the values of the response variable (y) that is explained by the regression line r=1: R2=1: regression line explains all (100%) of
the variation in y r=.7: R2=.49: regression line explains almost
half(50%) of the variation in y
Chapter 14 10
Income versus Assets
0
50
100
150
200
250
300
0 20 40 60
assets (billions)
inco
me
(mil
lio
ns)
Income =a + bAssets
Assets vary from 3.4 billion to 49 billion
Income varies from bank to bank, even among those with similar assets
Statistical relationship
Chapter 15 11
A CautionBeware of Extrapolation
Sarah’s height was plotted against her age
Can you predict her height at age 42 months?
Can you predict her height at age 30 years (360 months)?
80
85
90
95
100
30 35 40 45 50 55 60 65
age (months)
hei
gh
t (c
m)
Chapter 15 12
A CautionBeware of Extrapolation
Regression line:y = 71.95 + .383 x
height at age 42 months? y = 88 cm.
height at age 30 years? y = 209.8 cm.– She is predicted to
be 6' 10.5" at age 30.70
90
110
130
150
170
190
210
30 90 150 210 270 330 390
age (months)
hei
gh
t (c
m)
Chapter 15 13
Correlation Does Not Imply Causation
Even very strong correlations may not correspond to a real
causal relationship.
Click for Graphical
Explanation
Chapter 15 14
Evidence of Causation A properly conducted experiment
establishes the connection Other considerations:
– A reasonable explanation for a cause and effect exists
– The connection happens in repeated trials – The connection happens under varying
conditions– Potential confounding factors are ruled out– Alleged cause precedes the effect in time
Chapter 15 15
Reasons Two Variables May Be Related (Correlated)
Explanatory variable causes change in response variable
Response variable causes change in explanatory variable
Explanatory may have some cause, but is not the sole cause of changes in the response variable
Confounding variables may exist Both variables may result from a common cause
– such as, both variables changing over time The correlation may be merely a coincidence
Chapter 15 16
Explanatory causes Response
Explanatory: pollen count from grasses Response: percentage of people
suffering from allergy symptoms
Explanatory: amount of food eaten Response: hunger level
Chapter 15 17
Response causes Explanatory
Explanatory: Hotel advertising dollars Response: Occupancy rate
Positive correlation? – more advertising leads to increased occupancy rate? Actual correlation is negative: lower
occupancy leads to more advertising
Chapter 15 18
Explanatory is notSole Contributor
barbecued foods are known to contain carcinogens, but other lifestyle choices may also contribute
Explanatory: Consumption of barbecued foods
Response: Incidence of stomach cancer
Chapter 15 19
Confounding Variables
Explanatory: Meditation Response: Aging (measurable aging
factor)
general concern for one’s well being may be confounded with decision to try meditation
Meditation vs. Aging
Chapter 15 20
Common Response(both variables change due to
common cause)
Both may result from an unhappy marriage.
Explanatory: Divorce among men Response: Percent abusing alcohol
Chapter 15 21
Both Variables are Changing Over Time
Both divorces and suicides have increased dramatically since 1900.
Are divorces causing suicides? Are suicides causing divorces??? The population has increased
dramatically since 1900 (causing both to increase). Better to investigate: Has the rate of divorce
or the rate of suicide changed over time?
Chapter 15 22
The Relationship May Be Just a Coincidence
We will see some strong correlations (or apparent associations) just by chance, even when the variables are not related in the population
Chapter 15 23
A required whooping cough vaccine was blamed for seizures that caused brain damage– led to reduced production of vaccine (due to lawsuits)
Study of 38,000 children found no evidence for the accusations (reported in New York Times)– “people confused association with cause-and-effect”– “virtually every kid received the vaccine…it was inevitable
that, by chance, brain damage caused by other factors would occasionally occur in a recently vaccinated child”
Issues with this analysis??? – Prevalence
Coincidence (?)Vaccines and Brain Damage
Chapter 15 24
Case Study
House, J., Landis, K., and Umberson, D. “Social Relationships and Health,” Science, Vol. 241 (1988), pp 540-545.
Social Relationships and Health
Does lack of social relationships cause people to become ill?
Or, are unhealthy people less likely to establish and maintain social relationships?
Or, is there some other factor that predisposes people both to have lower social activity and become ill?
Chapter 15 25
Key Concepts
Least Squares Regression Equation R2
Correlation does not imply causation Confirming causation Reasons variables may be correlated
Continued…
Chapter 15 26
Cautionsabout Correlation and Regression
only describe linear relationships are both affected by outliers always plot the data before interpreting beware of extrapolation
– predicting outside of the range of x
beware of lurking variables– have important effect on the relationship among the
variables in a study, but are not included in the study
association does not imply causation
Chapter 15 27
Least Squares Regression
A least squares regression line makes the vertical distances from the data points to the line small. Return to
Slide 7
Chapter 15 28
A few explanations for an observed association
A dashed line shows an association. An arrow shows a cause-and-effect link.
Variable x is explanatory, y is a response variable, and z is a lurking variable.Return to Slide 13