Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | felicity-tyler |
View: | 216 times |
Download: | 0 times |
Chapter 8
Residuals, Part 2Residuals are the amount that a data score is
above or below the line.A data score which is above the line will
result in a positive residual.A data score which is below the line will
result in a negative residual.The line represents every possible prediction
our model might make about the data.
Residuals, Part 2A positive residual means it was an
underestimate.This is because a positive residual means the
data score was above the line...which means the line was below the data score.
Since the line is our estimate, it would be an underestimate.
Residuals, Part 2Once we have calculated all of the residuals,
we can plot them in a special kind of scatterplot.It is called a residual plot, but it is really just a
scatterplot with the residuals for the y-values.An ideal residual plot will have no visible
patterns and will not fan out in one direction.
Residuals, Part 2When we make our line, we want it to be the
best model available for our data.There are two parts to the relationship
between our two data sets.One of these part is the part we describe with
our model.The other part is the part which our model
does not account for.
Residuals, Part 2The residuals represent the variation in the data
which our model does not account for.If the residuals have an obvious pattern, then
that suggests that our model is incomplete.If the residual plot has a pattern, this provides
insight on what sort of transformation would be ideal to use.We will not be transforming data based on this,
but it is what later classes in statistics might do.You are still expected to be aware this is why we
check for patterns in the residual plot though.
The Basic IdeaThe least squares regression line is based on
the idea that you can approximate the z-scores for the response variable.
The way this is done is we take a data value from our predictor variable, find the z-score for that data value, multiply it by our correlation, and then convert this new z-score into a data score for our response variable.
Conservative EstimatesSince the correlation is never more than 1
and never less than -1, our predicted z-score for the response variable will never be further from the mean than our z-score for the predictor variable.
In other words, we will tend to make predictions conservatively, leaning closer to the mean.
Conservative EstimatesSo, if we were using weights to predict
height, any weight we used would be predicting a more conservative height.In other words a more unusual weight would
predict a less unusual height.If we went the other way around, then any
height would predict a more conservative weight.This is the reverse of what we had before.
Conservative EstimatesBecause we will be predicting conservatively,
it matters which variable is doing the predicting.
Even though correlation is the same between two variables no matter what the order, the regression line changes.
If we want to switch which variable we use for predicting, then we need to recalculate the regression line.
Example from BookThere is an example in your book, starting on
page 180.The basic idea is that there is a moderate
relationship between the grams of fat and grams of protein in Burger King menu items.
Because this example is roughly one page worth of paragraphs, I am only going to summarize it, but I recommend you make time to read it.
Example from BookSo first grams of protein are used to predict
grams of fat.Ex. 30 grams of protein predicts 35.9 grams of fat.
The grams of fat are used to predict grams of protein.If we put in 35.9 grams of fat, intuitively we would
expect 30 grams of protein.26.0 grams of protein are predicted by the new
equation instead.This is why we have to refigure our line every
time we use the other variable to predict.
Interpreting the LineTo interpret the equation of the line we need
to interpret the slope and we need to interpret the intercept.
This is generally best done as a mostly scripted process.
As a side note, interpret is a key word in a stats question to let you know you need to produce one or more sentences.
Interpreting the SlopeSlope is rise over run.This means that when you express slope as a
fraction, the top of the fraction is your change in the y direction.In other words, the change in your response
variable.The bottom of the fraction is your change in
the x direction.So, the change in the predictor variable.
Interpreting the SlopeWe can turn any number into a fraction by
putting it over 1.So the slope we calculated, we can just put
over 1, and now it is a fraction.In other words, when we interpret the slope,
the change on the x-axis (predictor) will always be 1 unit.
Interpreting the SlopeThe slope we calculated (b or b1) will be the
change on our y-axis (response).So in yesterday’s example estimating the cost
of tops based on the cost of pants, our slope was 1.10 and presumably in US dollars.I’m clarifying now that it IS in US dollars.
So the change in our x (pants) is 1, and the change in our y (top) is 1.10.
Interpreting the SlopeThe general form of the script is:“The model predicts that for every <1 unit
more of x> that <y> will <increase/decrease> by <b/b1>.”
Because I care, here is the specific example:“The model predicts that for every dollar more
that the pants cost, that the cost of the top worn with them will increase by $1.10.”
Interpreting the SlopeOnce more:
“The model predicts that for every dollar more that the pants cost, that the cost of the top worn with them will increase by $1.10.”
Note that when I talk about the 1 unit change in x, I did not even use a number as much as I just mentioned the unit.
Note that for the change in y, I very much used the number.
Interpreting the InterceptTwo minus one is one.
• 2 – 1 = 1One minus zero is one.
• 1 – 0 = 1Any starving person can tell you that the
difference between one meal and two meals is not the same as the difference between one meal and no meals at all.
Interpreting the InterceptConcept Alert!0 is the number we typically used to
represent things such as “none”, “nothing”, and “completely missing”.
It is important to be able to see the number 0 and then rationally consider what “nothing” means in the context of the problem.
Interpreting the InterceptThe intercept is our prediction when x (the
predictor variable) equals zero.Sometimes the intercept is ridiculous to talk
about, because in the context of the problem, “nothing” is a goofy idea.
We still want to interpret the slope even when it is ridiculous, but we usually want to also mention that it is ridiculous.
Interpreting the InterceptConsider the Burger King foods from the
book:0 grams of fat could make sense.0 grams of protein could make sense.We would want to use terms like “fat free” and
“protein free” instead of saying 0 grams.To consider our clothing example, we have to
get a bit more technical.
Interpreting the InterceptOne way to interpret what $0 cost pants are
is “free pants”.If the study was instead based on the retail
value of clothing rather than what the person wearing it personally paid, it now refers to genuinely valueless pants.This pretty much means either no pants at all
or pants so useless they are not all that different from no pants at all.
Interpreting the InterceptWhile some of the bolder contrarians out
there will want to disagree with me, I say it is ridiculous to discuss valueless pants.
I think the concept of valueless pants is just silly and it should be considered an example of when an intercept does not have useful predictive power.
Interpreting the InterceptIf we were using the nicotine content of
cigarettes to see how many times a week a smoker would go buy cigarettes, our intercept would be a prediction for cigarettes with no nicotine at all.
While not personally a smoker, it seems to me that a cigarette would kill the point even more than a caffeine-free diet soda.
So I would say this is also a ridiculous intercept.
Interpreting the InterceptWhether or not it is ridiculous, here is the
script:“The model predicts that for <explanation of
what 0 means> the <response variable> would be <a/b0>.”
So in the context of our clothing, where the intercept was -10:“The model predicts that for truly valueless
pants the cost of the top worn with them would be -$10.”
AssignmentsRead the first half of Chapter 9 for Friday.Chapter 8 Quiz Friday.Ch. 8: Do 1 problem from each set of 5You may skip two sets of 5 (of your choosing).You should do 8 problems.Due Monday.
Chapter 8 Quiz BulletpointsKnow why we need a hat for regression
equations.Know how to find a least squares regression
line from the data.Know how to find a least squares regression
line from the summary statistics.Know how to find r from r2.Know how to interpret the slope and the
intercept for a regression line.