Simple Linear Regression
Using one variable to …
1) explain the variability of another variable
2) predict the value of another variable
Both accomplished with the line that best fits a scatterplot.
Linear Regression Slide #2
Linear Regression Slide #3
Recall -- Definitions• Response (dependent) variable
– variability is being explained or values are predicted– y-axis
• Explanatory (independent, predictor) variable– used to explain variability or make predictions– x-axis
Review -- Line Characteristics
1. What is the most common equation of a line?
2. What does the slope tell us?
3. What does the intercept tell us?
Linear Regression Slide #4
Linear Regression Slide #5
Finding the Best-Fit LineCandidate Lines
80 90 100 110 120
80
90
100
110
120
130
X
Y
We need an objective criterion
Linear Regression Slide #6
Finding the Best-Fit LineDefinition -- Predicted Y ( )y
• The y-coordinate of the point on the line that corresponds to the observed x value
110 120
110
120
130
X
y,x
y,xy
y
• Plug value of x into line equation to get y
Linear Regression Slide #7
Finding the Best-Fit LineDefinition -- Residual
80 90 100 110 120
80
90
100
110
120
130
X
Y
Residual = Observed Y - Predicted Y
Residual = Observed Y - Predicted Y
Linear Regression Slide #8
Finding the Best-Fit Lineminimize sum of residuals?
80 90 100 110 120
80
90
100
110
120
130
X
Y
Linear Regression Slide #9
• RSS = sum of squared residuals• the line out of all possible lines that minimizes
the RSS
• Should the RSS be computed for all lines?
Finding the Best-Fit Lineminimize sum of squared residuals?
x
y
s
sr"slope"
x*slopeyintercept"-y"
Linear Regression Slide #10
So ….
• It is important to understand – where the equation of the line comes from– how to interpret the line
• It is not important to compute the best-fit line “by hand”
Linear Regression Slide #11
Example -- Rabbit Metabolic RateKatzner et al. (1997; J. Wildl. Man. 78:1053-1062)
examined the metabolic rate of pygmy rabbits (Brachylagus idahoensis) in the laboratory. In particular, they wanted to determine if the variability in resting metabolic rate (ml O2 g-1 h-1) at 20oC could be adequately explained by body mass (g).
What is the response variable?– Resting metabolic rate
What is the explanatory variable?– Body mass
1
2
Linear Regression Slide #12
Example -- Rabbit Metabolic Rate
Y = 1.41 - 0.00124X
R-Sq = 55.4 %
400 450 500
0.8
0.9
1.0
Mass
Met
abol
ic R
ate
In terms of the variables of the problem, what is the equation of the best-fit line?MetRate = 1.41-0.00124Mass
3
Linear Regression Slide #13
Example -- Rabbit Metabolic Rate
Y = 1.41 - 0.00124X
R-Sq = 55.4 %
400 450 500
0.8
0.9
1.0
Mass
Met
abol
ic R
ate In terms of the variables
of the problem, interpret the value of the slope?
For each additional gram of mass, the metabolic rate decreases
0.00124 ml O2 g-1 h-1 on average
4
Linear Regression Slide #14
Example -- Rabbit Metabolic Rate
Y = 1.41 - 0.00124X
R-Sq = 55.4 %
400 450 500
0.8
0.9
1.0
Mass
Met
abol
ic R
ate In terms of the variables of
the problem, interpret the value of the y-intercept?
Rabbits with no mass have a metabolic rate of 1.41 ml O2 g-1 h-1 on average
5
Linear Regression Slide #15
Example -- Rabbit Metabolic Rate
Y = 1.41 - 0.00124X
R-Sq = 55.4 %
400 450 500
0.8
0.9
1.0
Mass
Met
abol
ic R
ate
What is the predicted metabolic rate for a mass of 450 g?
6
(450,0.85) What is the predicted metabolic rate for a mass of 600 g?
7
What is the residual for a mass of 425 g and a metabolic rate of 0.82 ml O2 g-1 h-1?
8
(425,0.82)
(425,0.88)
Linear Regression Slide #16
One More Regression Statistic
• r2 = coefficient of determination• = proportion of the total variability in the
response variable explained away by knowing the value of the explanatory variable
Linear Regression Slide #17
Visualizing r2
Height
We
igh
t
Tot
al V
aria
bil
ity
in Y
Variability Explained
r2 = Variability Explained
Total Variability in y =
Vrb
ilit
yR
emai
n
Linear Regression Slide #18
Characteristics of r2
• What range of values can r2 be?
• Which relationship is stronger -- r2 = 0.5 or 0.9?• Which relationship gives “better” predictions --
r2 = 0.5 or 0.9?
0 < r2 < 1
Linear Regression Slide #19
Example -- Rabbit Metabolic Rate
Y = 1.41 - 0.00124X
R-Sq = 55.4 %
400 450 500
0.8
0.9
1.0
Mass
Met
abol
ic R
ate What proportion of the variability in metabolic rate is explained by knowing mass?
r2 = 0.554
9
What is the correlation between metabolic rate and mass?
r = 0.5540.5 = -0.744
10
Simple Linear Regression in R
• Examine handout– lm()– rSquared()– fitPlot()– predict()
Linear Regression Slide #20
Linear Regression Slide #21
Regression is the Most Used and Most Abused Statistical Technique
• Assumptions:– A line adequately models the data– Homoscedasticity – same scatter of points along
entire line
– Residuals at any given value of the explanatory variable are normally distributed
– Residuals at any given value of the explanatory variable are independent
Intr
oA
dva
nce
d
Linear Regression Slide #22
A Line Models the Data
80 100 120
80
100
120
80 100 120
80
100
120
80 100 120
80
100
120
80 100 120
80
100
120
Linear Regression Slide #23
Homoscedasticity
80 100 120
80
100
120
80 100 120
80
100
120
80 100 120
80
100
120
Linear Regression Slide #24
r2 doesn’t depend on x because of homoscedasticity
Tot
al V
aria
bil
ity
in YV
rbil
ity
Rem
ain
Variability Explained
Height
We
igh
t