Chapter 8. Residuals, Part 2 Residuals are the amount that a data score is above or below the line....

Chapter 8

Residuals, Part 2Residuals are the amount that a data score is

above or below the line.A data score which is above the line will

result in a positive residual.A data score which is below the line will

result in a negative residual.The line represents every possible prediction

our model might make about the data.

Residuals, Part 2A positive residual means it was an

underestimate.This is because a positive residual means the

data score was above the line...which means the line was below the data score.

Since the line is our estimate, it would be an underestimate.

Residuals, Part 2Once we have calculated all of the residuals,

we can plot them in a special kind of scatterplot.It is called a residual plot, but it is really just a

scatterplot with the residuals for the y-values.An ideal residual plot will have no visible

patterns and will not fan out in one direction.

Residuals, Part 2When we make our line, we want it to be the

best model available for our data.There are two parts to the relationship

between our two data sets.One of these part is the part we describe with

our model.The other part is the part which our model

does not account for.

Residuals, Part 2The residuals represent the variation in the data

which our model does not account for.If the residuals have an obvious pattern, then

that suggests that our model is incomplete.If the residual plot has a pattern, this provides

insight on what sort of transformation would be ideal to use.We will not be transforming data based on this,

but it is what later classes in statistics might do.You are still expected to be aware this is why we

check for patterns in the residual plot though.

The Basic IdeaThe least squares regression line is based on

the idea that you can approximate the z-scores for the response variable.

The way this is done is we take a data value from our predictor variable, find the z-score for that data value, multiply it by our correlation, and then convert this new z-score into a data score for our response variable.

Conservative EstimatesSince the correlation is never more than 1

and never less than -1, our predicted z-score for the response variable will never be further from the mean than our z-score for the predictor variable.

In other words, we will tend to make predictions conservatively, leaning closer to the mean.

Conservative EstimatesSo, if we were using weights to predict

height, any weight we used would be predicting a more conservative height.In other words a more unusual weight would

predict a less unusual height.If we went the other way around, then any

height would predict a more conservative weight.This is the reverse of what we had before.

Conservative EstimatesBecause we will be predicting conservatively,

it matters which variable is doing the predicting.

Even though correlation is the same between two variables no matter what the order, the regression line changes.

If we want to switch which variable we use for predicting, then we need to recalculate the regression line.

Example from BookThere is an example in your book, starting on

page 180.The basic idea is that there is a moderate

relationship between the grams of fat and grams of protein in Burger King menu items.

Because this example is roughly one page worth of paragraphs, I am only going to summarize it, but I recommend you make time to read it.

Example from BookSo first grams of protein are used to predict

grams of fat.Ex. 30 grams of protein predicts 35.9 grams of fat.

The grams of fat are used to predict grams of protein.If we put in 35.9 grams of fat, intuitively we would

expect 30 grams of protein.26.0 grams of protein are predicted by the new

equation instead.This is why we have to refigure our line every

time we use the other variable to predict.

Interpreting the LineTo interpret the equation of the line we need

to interpret the slope and we need to interpret the intercept.

This is generally best done as a mostly scripted process.

As a side note, interpret is a key word in a stats question to let you know you need to produce one or more sentences.

Interpreting the SlopeSlope is rise over run.This means that when you express slope as a

fraction, the top of the fraction is your change in the y direction.In other words, the change in your response

variable.The bottom of the fraction is your change in

the x direction.So, the change in the predictor variable.

Interpreting the SlopeWe can turn any number into a fraction by

putting it over 1.So the slope we calculated, we can just put

over 1, and now it is a fraction.In other words, when we interpret the slope,

the change on the x-axis (predictor) will always be 1 unit.

Interpreting the SlopeThe slope we calculated (b or b1) will be the

change on our y-axis (response).So in yesterday’s example estimating the cost

of tops based on the cost of pants, our slope was 1.10 and presumably in US dollars.I’m clarifying now that it IS in US dollars.

So the change in our x (pants) is 1, and the change in our y (top) is 1.10.

Interpreting the SlopeThe general form of the script is:“The model predicts that for every <1 unit

more of x> that <y> will <increase/decrease> by <b/b1>.”

Because I care, here is the specific example:“The model predicts that for every dollar more

that the pants cost, that the cost of the top worn with them will increase by $1.10.”

Interpreting the SlopeOnce more:

“The model predicts that for every dollar more that the pants cost, that the cost of the top worn with them will increase by $1.10.”

Note that when I talk about the 1 unit change in x, I did not even use a number as much as I just mentioned the unit.

Note that for the change in y, I very much used the number.

Interpreting the InterceptTwo minus one is one.

• 2 – 1 = 1One minus zero is one.

• 1 – 0 = 1Any starving person can tell you that the

difference between one meal and two meals is not the same as the difference between one meal and no meals at all.

Interpreting the InterceptConcept Alert!0 is the number we typically used to

represent things such as “none”, “nothing”, and “completely missing”.

It is important to be able to see the number 0 and then rationally consider what “nothing” means in the context of the problem.

Interpreting the InterceptThe intercept is our prediction when x (the

predictor variable) equals zero.Sometimes the intercept is ridiculous to talk

about, because in the context of the problem, “nothing” is a goofy idea.

We still want to interpret the slope even when it is ridiculous, but we usually want to also mention that it is ridiculous.

Interpreting the InterceptConsider the Burger King foods from the

book:0 grams of fat could make sense.0 grams of protein could make sense.We would want to use terms like “fat free” and

“protein free” instead of saying 0 grams.To consider our clothing example, we have to

get a bit more technical.

Interpreting the InterceptOne way to interpret what $0 cost pants are

is “free pants”.If the study was instead based on the retail

value of clothing rather than what the person wearing it personally paid, it now refers to genuinely valueless pants.This pretty much means either no pants at all

or pants so useless they are not all that different from no pants at all.

Interpreting the InterceptWhile some of the bolder contrarians out

there will want to disagree with me, I say it is ridiculous to discuss valueless pants.

I think the concept of valueless pants is just silly and it should be considered an example of when an intercept does not have useful predictive power.

Interpreting the InterceptIf we were using the nicotine content of

cigarettes to see how many times a week a smoker would go buy cigarettes, our intercept would be a prediction for cigarettes with no nicotine at all.

While not personally a smoker, it seems to me that a cigarette would kill the point even more than a caffeine-free diet soda.

So I would say this is also a ridiculous intercept.

Interpreting the InterceptWhether or not it is ridiculous, here is the

script:“The model predicts that for <explanation of

what 0 means> the <response variable> would be <a/b0>.”

So in the context of our clothing, where the intercept was -10:“The model predicts that for truly valueless

pants the cost of the top worn with them would be -$10.”

AssignmentsRead the first half of Chapter 9 for Friday.Chapter 8 Quiz Friday.Ch. 8: Do 1 problem from each set of 5You may skip two sets of 5 (of your choosing).You should do 8 problems.Due Monday.

Chapter 8 Quiz BulletpointsKnow why we need a hat for regression

equations.Know how to find a least squares regression

line from the data.Know how to find a least squares regression

line from the summary statistics.Know how to find r from r2.Know how to interpret the slope and the

intercept for a regression line.

Date post:	23-Dec-2015
Category:	Documents
Upload:	felicity-tyler
View:	216 times
Download:	0 times

Chapter 8. Residuals, Part 2 Residuals are the amount that a data score is above or below the line....

Documents