3.2A Least-Squares Regression
Linear (straight-line) relationships between two quantitative variables are pretty common and easy to understand.
Our instinct when looking at a scatterplot of data is to imagine (or actually draw) a line going through the data point
to summarize the relationship between two variables.
A _____________________________summarizes the relationship between two variables, but only in a specific
setting: when one of the variables is explanatory and the other is a response variable.
A regression line is a line that:
Do people with larger increases in NEA tend to gain less fat?
Interpretation of the Scatterplot
Interpreting a Regression Line
The equation of a regression line gives a mathematical description of what this model tells us about the relationship
between the response variable y and the explanatory variable x.
Regression Line, Predicted Value, Slope, Y-Intercept
In this equation,
(“y hat”) is the ____________________________________ of the response variable y for a given value of
the explanatory variable x.
b is the _________________________, the amount y is predicted to change when x increases by one unit.
a is the __________________________, predicted value of y when x = 0.
2. Interpret each value in context.
Prediction
What would the fat gain be of someone whose NEA increases by 12500 calories when he overeats?
Extrapolation
1. What is the slope of the regression line? Explain what it means in context.
2. What’s the y-intercept? Explain what it means in context.
3. Predict the rat’s weight after 16 weeks. Show your work.
4. Should you use this line to predict the rat’s weight at age 2 years? Use the equation to make the prediction
and think about the reasonableness of the result. (There are 454 grams in a pound)
35. What’s my line? You use the same bar of soap to shower each morning. The bar weighs 80 grams when it is new.
Its weight goes down by 6 grams per day on the average. What is the equation of the regression line for predicting
the weight from days of use?
3.2B Residuals and the Least-Squares Regression Line
A good regression line
The prediction errors (shown as the little vertical lines) represent the “leftover” variation in the response variable
after fitting the regression line. These vertical lines are known as________________________.
Example: Finding a Residual
Find and interpret a residual for the hiker who weighed 187 pounds.
The least-squares regression line of y on x is the line that
a) Find the equation of the least-squares regression line for predicting husband’s height from wife’s height. Show
your work.
b) Use your regression line to predict the height of the husband of a woman who is 67 inches tall.
Using Your Calculator to Find the Equation of the LSRL (Least-Squares Regression Line)
1)
2)
3)
One person’s NEA rose by 135 calories. That person gained 2.7 kg of fat.
The predicted fat gain for 135 calories is:
Therefore,
The residual for this person would be:
The most unique thing about the set of residuals is that ________________________________________.
3.2C Examining Residual Plots
Because the residuals show how far the data fall from our regression line, examining the residuals helps assess how
well the line describes the data.
Good Fit/Model Bad Fit/Model
Standard Deviation of the Residuals
How Well the Line Fits the Data: The Role of r2 in Regression
r2 =
Standard or “Average” Prediction Error (Residual)
The Coefficient of Determination r2
AP EXAM TIP: Students often have a hard time interpreting the value of r2 on AP exam questions. They frequently
leave out key words in the definition.
Our advice: Treat this as a fill-in-the-blank exercise. Write “____% of the variation in [response variable name] is
accounted for by the LSRL of y (context) on x (context).”
Example: Interpreting the value of r2
Refer to the problem about husbands and wives.
a) Find r2 and interpret this value in context.
b) For these data, s = 1.2. Explain what this value means.
3.2D Interpreting Computer Regression Output
Example: Interpreting Regression Output
How well does the number of beers a person drinks predict his or her blood alcohol content (BAC)? Sixteen
volunteers with an initial BAC of 0 drank a randomly assigned number of cans of beer. Thirty minutes later, a police
officer measured their BAC. Least-squares regression was performed on the data. A scatterplot with the regression
line added, a residual plot, and some computer output from the regression are shown below.
1. What is the equation of the LSR line?
2. Find the correlation.
3. Interpret the slope of the regression line in context.
4. Is a line an appropriate model to use for these data? What information tells you this?
5. What was the BAC reading for the person who consumed 9 beers? Show your work.
Correlation and Regression Wisdom