Ch3 Bivariate Data
ScatterplotsRegression
Scatterplots
Scatterplots
Scatterplots
Scatterplots
Scatterplots
L1 L2
Study Time and GPA
Study Time and GPA
Do a Residual PlotCalculator: In the List Menu(2nd Stat) find the name RESID and place in for Ylist.
Study Time and GPA
Residual Plot
A randomly scattered residual plot shows that a linear model is appropriate.
Study Time and GPA
Write the linear equation:
GPA = 1.8069326 + .4247748(Study Time)
Study Time and GPA
Interpret the Slope(b):
For every hour of study our model predicts an avg increase of .4247748319 in GPA.
Study Time and GPA
Interpret the y-intercept(a):
At 0 hours of study our model predicts a GPA of 1.8069326.
Study Time and GPA
Interpret the correlation(r):
There is a strong positive linear association between hours of study and GPA.
Study Time and GPA
Interpret the Coefficient of Determination(r2):
66.6% of the variation in GPA can be explained by the approximate linear relationship with hours of study.
Tootsie Pop GrabLAST YEAR
Tootsie Pop GrabHave you ever wondered how many tootsie pops you could grab in one hand?
LAST YEAR
Tootsie Pop GrabFirst we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?
LAST YEAR
Tootsie Pop Grab23 CM
LAST YEAR
Tootsie Pop GrabLAST YEAR
Tootsie Pop GrabLAST YEAR
Tootsie Pop GrabLAST YEAR
Tootsie Pop GrabLAST YEAR
Tootsie Pop Grab
Are there any outliers or influential points?
If this point was removed, the slope of the line would increase and the correlation would become stronger.
LAST YEAR
Tootsie Pop Grab
Predicted # of Pops = -12.9362 + 1.57199(Handspan)
LAST YEAR
Tootsie Pop Grab
For every………
Interpret the slope “b”LAST YEAR
Tootsie Pop Grab
For every cm of handspan our model predicts an avg increase of 1.57199322 in the # of pops you can grab.
Interpret the slope “b”LAST YEAR
Tootsie Pop Grab
If your handspan was 0 cm, ………
Interpret the y-intercept “a”LAST YEAR
Tootsie Pop Grab
If your handspan was 0 cm our model predicts -12.9361942 pops that can be grabbed.
Interpret the y-intercept “a”
Why is this not statistically significant?
This is not statistically significant because you cannot have a negative # of pops grabbed.
LAST YEAR
Tootsie Pop Grab
There is a , ………
Describe the association……this means interpret the correlation “r”
LAST YEAR
Tootsie Pop Grab
There is a moderate positive linear association between handspan and the # of pops you can grab.
Describe the association……this means interpret the correlation “r”
LAST YEAR
Tootsie Pop Grab
__% of the variation ………
Interpret the coefficient of determination “r2”
LAST YEAR
Tootsie Pop Grab
38.6% of the variation in pops grabbed can be explained by the approximate linear relationship with handspan.
Interpret the coefficient of determination “r2”
LAST YEAR
Scatterplot vs Residual Plot
The residual plot uses the same x-axis but the y-axis is the residuals.The residual plot shows the actual points. It shows whether they were above or below the prediction line.
LAST YEAR
Scatterplot vs Residual Plot
Prediction line
LAST YEAR
Tootsie Pop Grab
What was the predicted # of pops for a handspan of 24?
Predicted # of Pops = -12.9362 + 1.57199(Handspan)
LAST YEAR
Tootsie Pop Grab
What was the predicted # of pops for a handspan of 24?
Predicted # of Pops = -12.9362 + 1.57199(24)
24.79
LAST YEAR
Tootsie Pop Grab
What was the ACTUAL # of pops for a handspan of 24?
Check the residual plot for this.
It’s predicted +/- residual.
24.79 + 4 = 28.79
LAST YEAR
Skittles Bag GrabPeriod 2
Skittles Bag GrabHave you ever wondered how many skittles bags you could grab in one hand?
Period 1
Skittles Bag GrabFirst we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?
Period 2
Skittles Bag Grab23 CM
Period 2
Skittles Bag GrabPeriod 2
Predicted # of Skittles Bags = -16.639+ 1.74544(Handspan)
Skittles Bag GrabPeriod 2
For every………
Interpret the slope “b”
Period 2Skittles Bag Grab
Skittles Bag Grab
For every cm of handspan our model predicts an avg increase of 1.74544 in the # of skittles bags you can grab.
Interpret the slope “b”
Period 2
If your handspan was 0 cm, ………
Interpret the y-intercept “a”
Period 2Skittles Bag Grab
If your handspan was 0 cm our model predicts -16.639 in the # of skittles bags that can be grabbed.
Interpret the y-intercept “a”
Why is this not statistically significant?
This is not statistically significant because you cannot have a negative # of skittles bags grabbed.
Period 2Skittles Bag Grab
There is a , ………
Describe the association……this means interpret the correlation “r”
Period 2Skittles Bag Grab
There is a moderate positive linear association between handspan and the # skittles bags you can grab.
Describe the association……this means interpret the correlation “r”
Period 2Skittles Bag Grab
__% of the variation ………
Interpret the coefficient of determination “r2”
Period 2Skittles Bag Grab
42.7% of the variation in skittles bags grabbed can be explained by the approximate linear relationship with handspan.
Interpret the coefficient of determination “r2”
Period 2Skittles Bag Grab
Tootsie Pop GrabPeriod 3
Tootsie Pop GrabHave you ever wondered how many tootsie pops you could grab in one hand?
Period 3
Tootsie Pop GrabFirst we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?
Period 3
Tootsie Pop Grab23 CM
Period 3
Tootsie Pop GrabPeriod 3
Tootsie Pop GrabPeriod 3
Tootsie Pop Grab
Predicted # of Pops = -27.7801 + 2.34491(Handspan)
Period 3
Tootsie Pop Grab
For every………
Interpret the slope “b”
Period 3
Tootsie Pop Grab
For every cm of handspan our model predicts an avg increase of 2.34491 in the # of pops you can grab.
Interpret the slope “b”
Period 3
Tootsie Pop Grab
If your handspan was 0 cm, ………
Interpret the y-intercept “a”
Period 3
Tootsie Pop Grab
If your handspan was 0 cm our model predicts -27.7801 pops that can be grabbed.
Interpret the y-intercept “a”
Why is this not statistically significant?
This is not statistically significant because you cannot have a negative # of pops grabbed.
Period 3
Tootsie Pop Grab
There is a , ………
Describe the association……this means interpret the correlation “r”
Period 3
Tootsie Pop Grab
There is a moderate positive linear association between handspan and the # of pops you can grab.
Describe the association……this means interpret the correlation “r”
Period 3
Tootsie Pop Grab
__% of the variation ………
Interpret the coefficient of determination “r2”
Period 3
Tootsie Pop Grab
48.0% of the variation in pops grabbed can be explained by the approximate linear relationship with handspan.
Interpret the coefficient of determination “r2”
Period 3
Smarties GrabPeriod 4
Have you ever wondered how many smarties you could grab in one hand?
Smarties GrabPeriod 4
First we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?
Smarties GrabPeriod 4
23 CM
Smarties GrabPeriod 4
Smarties GrabPeriod 4
Predicted # of Smarties Packages = -10.3911+ 1.8359(Handspan)
Smarties GrabPeriod 4
For every………
Interpret the slope “b”
Smarties GrabPeriod 4
For every cm of handspan our model predicts an avg increase of 1.8359 in the # of smarties you can grab.
Interpret the slope “b”
Smarties GrabPeriod 4
If your handspan was 0 cm, ………
Interpret the y-intercept “a”
Smarties GrabPeriod 4
If your handspan was 0 cm our model predicts -10.3911 in the # of smarties that can be grabbed.
Interpret the y-intercept “a”
Why is this not statistically significant?
This is not statistically significant because you cannot have a negative # of smarties grabbed.
Smarties GrabPeriod 4
There is a , ………
Describe the association……this means interpret the correlation “r”
Smarties GrabPeriod 4
There is a moderate positive linear association between handspan and the # smarties you can grab.
Describe the association……this means interpret the correlation “r”
Smarties GrabPeriod 4
__% of the variation ………
Interpret the coefficient of determination “r2”
Smarties GrabPeriod 4
35.1% of the variation in smarties grabbed can be explained by the approximate linear relationship with handspan.
Interpret the coefficient of determination “r2”
Smarties GrabPeriod 4
Surfing
Below are 22 randomly selected days that Mr. Pines has surfed in the past few years.
Time(Min)
45 60 43 30 62 59 61 44 70 75 85
# of Waves
2 6 5 2 5 8 5 6 15 9 11
Time(Min) 90 58 47 31 63 64 73 42 65 57 66# of Waves
10 6 7 3 2 10 10 7 3 12 12
Is there an association between minutes surfed and # of waves ridden?
Create a Scatterplot of the data.
Minutes will be the Explanatory Variable x and # of Waves will be the response variable y.
Calculator: Minutes in L1 and Waves in L2
Find your linear model.Calculator: Stat, Calc, 8:LinReg(a + bx), L1,L2,Vars,Y-Vars,1:Function,Y1
If your r and r2 do not show up you need to go to catalog and turn Diagnostic On
Write your linear model properly.
Predicted # of Waves = -1.205 + .141811(minutes surfed)
DO NOT use X and Y, ALWAYS use the words in context.
Use your linear model to make a prediction.
Predicted # of Waves = -1.205 + .141811(49)
How many waves does your model predict if you surfed for 49 minutes?
Predicted # of Waves = 5.74
Beware of Extrapolation.
Predicted # of Waves = -1.205 + .141811(120)
How many waves does your model predict if you surfed for 120 minutes?
Predicted # of Waves = 15.81
Because 120 minutes is beyond our domain on the x-axis our answer cannot be trusted, this is called Extrapolation.
Interpret the y-intercept “a”
Surfing 0 minutes, our model predicts -1.205 waves ridden.
ALWAYS use context.
Interpret the slope “b”
For every minute surfed, our model predicts an average increase of .141481 in waves ridden.
ALWAYS use context.
Interpret the correlation coefficient “r”
There is a moderate positive linear association between minutes surfed and waves ridden.
ALWAYS use context.
Interpret the coefficient of determination “r2”
35.9% of the variation in waves ridden can be explained by the approximate linear relationship with minutes surfed.
ALWAYS use context.
Graph the Scatterplot Again.Now that you have had your calculator find your linear model, the LSRL should now show up on your scatterplot
Calculator: Zoom 9
Do a Residual PlotCalculator: In the List Menu(2nd Stat) find the name RESID and place in for Ylist.
What does the Residual Plot tell you?
The points on the residual plot are called residuals. They are the actual points and the horizontal axis is your LSRL.
What does the Residual Plot tell you?
If the residual plot shows a random scatter like this one, then the linear model is a good fit. If there is a curved pattern then a nonlinear model may be a better fit.(we will do these later in chapter 12.
Understanding Computer OutputThis is often given to you on the AP test so you don’t have to waste time putting #’s in your calculator
Make sure you know which one is “a” and “b”R is obtained from taking the square root of R-Sq
S is the standard deviation of the residuals(not used much in our class)
Seb is the standard deviation of the slope(more of this in CH15)T and P are also used in CH15
Too many people at my party
Is there an association between the # of people at a party and the # of fights that occur?
Influential Points and OutliersPoints that are extreme values in the x-direction may be influential points.
An influential point is a point that strongly affects the regression line if that point was removed.
Extreme points
in x-direction
in y-direction
Try removing this point
Notice the change in the regression line and the value of the slope.
That point is an influential point
Try removing this point
Notice how the slope and regression line do not change much.
That point is NOT an influential point it is an Outlier
Influential Points and OutliersPoints that are extreme values in the x-direction are called influential points.
Points that are extreme in the y-direction are called Outliers…..an outlier will have a large residual value Outlier
Influential Point
More Influential Points
More Influential Points
Using the Residual Plot
This is the scatterplot for # of people at a party and # of fights that occur.
Residual plot for people at party and fights
What is the predicted # of fights for having 8 people at a party?
Fights = 2.94738 + .1222(people)
Fights = 2.94738 + .1222(people)Fights = 2.94738 + .1222(8)Predicted # of fights is about 3.92
Residual plot for people at party and fights
What was the actual # of fights that occurred for having 8 people at a party?
Fights = 2.94738 + .1222(people)
The residual seems to be about 2 below the prediction line. So 3.92 – 2 = 1.92 actual fights.
You can see that the original point (8,2) matches our answer from the previous slide.
Correlation
Correlation
Correlation
Correlation
r = ± .70 to ± .99 Strong Correlation
r = ± .40 to ±.69 Moderate Correlation
r = ± .01 to ± .39 Weak Correlation
Scatterplot & Residual PlotSometimes you can spot a curved residual plot in the scatterplot
Slope and CorrelationThe slope and the correlation should be heading in the same direction
Ministers and Rum
Ministers and Rum
Explanatory Variable(X)# of Methodist Ministers
Response Variable(Y)Barrels of Cuban Rum
Ministers and Rum
Ministers and Rum1. Write the linear equation.
2. In your own words tell me what the meaning of the y-intercept is for this situation.
3. Make a prediction for the number of barrels of rum if there were 150 methodist ministers.
Predicted # of Barrels of Rum = 33.18073414 + 132.1220623(Ministers)
If there were no ministers we could expect about 33 barrels of rum
Plugging 150 into the linear equation predicts 19,851.5 barrels.
Ministers and Rum
6. The correlation is near perfect, what conclusions can be made here?
5. Describe the association between Ministers and Rum.
4. Make a prediction for the number of barrels of rum if there were 400 methodist ministers…..What would be your concerns with making this type of prediction?
Plugging 400 into the linear model predicts 52,882 barrels…….this is a concern because we are predicting beyond the domain of the x-axis…..this is called EXTRAPOLATION.
There is a strong positive linear association between the # of methodist ministers and the # of barrels of rum.
We cannot make conclusions or cause and effect. We can only SUGGEST an association.
Ministers and Rum7. Since it is not likely that the ministers were drinking the rum, what might be a lurking variable for this situation?
Population increase from 1860 to 1940 brings a demand for more ministers and more rum
Ministers and Rum8. What year created the largest residual in this situation?
1920
Correlation Does Not Imply Causation
Hand Span vs Foot Length
Hand Span(cm)
Foot
Len
gth
(in)
Hand Span vs Foot LengthPredicted Foot Length = 1.17366746 + .4165143(Hand Span)
Interpret a,b,r,r2
At a hand span of 0 cm our model predicts a foot size of about 1.17 inches
For every additional cm in hand span our model predicts and avg increase of about .417 inches in foot length.
There is a moderate positive linear relationship between hand span and foot length.
47.5% of the variation in foot length can be explained by the linear relationship with Hand Span.
R2 = .4750
Correlation Does Not Imply Causation
Correlation Does Not Imply Causation
Correlation Does Not Imply Causation
Correlation Does Not Imply Causation
Correlation Does Not Imply Causation
Correlation Does Not Imply Causation
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 2
Predicted Texts Received = 2.63407 + .905352(texts sent)
There is a strong positive linear association between the number of texts sent and received.
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 2
Is this linear model a good fit?
Period 2
Yes, a linear is a good fit because the RESIDUALS show a random scatter above and below the prediction line.
Don’t call them DOTS or IT or THEY
Insert residual plot
What is the residual for the student who sent 40 text messages?
Period 2
Predicted Texts Received = 2.63407 + .905352(texts sent)
-2
Insert residual plot
What was the actual number of text messages received for the student with 40 text messages sent?
Period 2
Predicted Texts Received = 2.63407 + .905352(texts sent)
37
Insert residual plot
What was the actual number of text messages received for the student with 0 text messages sent?
Period 2
Predicted Texts Received = 2.63407 + .905352(texts sent)
5
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 3
Predicted Texts Received = -1.2379 + 1.08031(texts sent)
There is a strong positive linear association between the number of texts sent and received.
Insert residual plotPeriod 3
Is this linear model a good fit?
Yes, a linear model is a good fit because the RESIDUALS show a random scatter above and below the prediction line.
Don’t call them DOTS or IT or THEY
Insert residual plot
What is the residual for the student who sent 35 text messages?
Period 3
Predicted Texts Received = -1.2379 + 1.08031(texts sent)
-4
Insert residual plot
What was the actual number of text messages received for the student with 35 text messages sent?
Period 3
Predicted Texts Received = -1.2379 + 1.08031(texts sent)
32
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 4
Predicted Texts Received = 1.79437 + 0.848007(texts sent)
There is a strong positive linear association between the number of texts sent and received.
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 4
Insert residual plot
Is this linear model a good fit?
Period 4
Yes, a linear is a good fit because the RESIDUALS show a random scatter above and below the prediction line.
Don’t call them DOTS or IT or THEY
Insert residual plot
What is the residual for the student who sent 15 text messages?
Period 4
Predicted Texts Received = 1.79437 + 0.848007(texts sent)
-8
Insert residual plot
What was the actual number of text messages received for the student with 15 text messages sent?
Period 4
Predicted Texts Received = 1.79437 + 0.848007(texts sent)
7
Things you need learn to do for CH3•Do a scatterplot on your calculator•Do a Residual plot on your calculator•Find the linear equation on your calculator•Write your linear equation in context•Make a prediction using your equation•Interpret the slope(b) in context•Interpret the y-intercept(a) in context•Interpret “r” in context•Interpret “r2” in context•Use the residual plot to determine if your model is a good fit.
•Correlation does not imply Causation…..this means that a scatterplot with a strong correlation does not necessarily mean that x leads to y. Only a well designed experiment can give cause and effect conclusions.
•Making predictions beyond the domain of the x-axis cannot be trusted, this is called Extrapolation
•On a scatterplot extreme values in the y-direction are called “outliers” and extreme values in the x-direction are called “influential points”
•R is the correlation coefficient, it measures the strength of the association between x and y
•R2 is the coefficient of determination which is the % of variation in y that is explained by approzimate linear association with x
•The sum and the mean of the residuals is always zero.
•The standard deviation of the residuals gives a measure of how the points in the scatterplot are spread around the regression line.
•The point is always on the regression line.
•The correlation “r” is not changed by adding the same number to every value of one of the variables, by multiplying every value of one of the variables by the same positive number, or by interchanging the x and y variables.
•The correlation “r” cannot be greater than 1 or less than -1.
•Correlation is strongly affected by outliers.
•The slope and the correlation have the same sign.
•Influential points are points who sharply affect the regression line. An influential point may have a small residual but have a large effect on the regression line.
•A residual plot that shows a curved pattern shows that your linear model may not be a good fit. A residual plot that is randomly scattered shows that your model may be a good fit.