+ All Categories
Home > Documents > Scatterplots

Scatterplots

Date post: 23-Feb-2016
Category:
Upload: christmas
View: 30 times
Download: 0 times
Share this document with a friend
Description:
Scatterplots. Scatterplots. Scatterplots. Scatterplots. Scatterplots. Study Time and GPA. Study Time and GPA. Study Time and GPA. Residual Plot. A randomly scattered residual plot shows that a linear model is appropriate. Study Time and GPA. Write the linear equation:. - PowerPoint PPT Presentation
Popular Tags:
155
Ch3 Bivariate Data Scatterplots Regression
Transcript
Page 1: Scatterplots

Ch3 Bivariate Data

ScatterplotsRegression

Page 2: Scatterplots

Scatterplots

Page 3: Scatterplots

Scatterplots

Page 4: Scatterplots

Scatterplots

Page 5: Scatterplots

Scatterplots

Page 6: Scatterplots

Scatterplots

Page 7: Scatterplots
Page 8: Scatterplots

L1 L2

Page 9: Scatterplots

Study Time and GPA

Page 10: Scatterplots

Study Time and GPA

Page 11: Scatterplots

Do a Residual PlotCalculator: In the List Menu(2nd Stat) find the name RESID and place in for Ylist.

Page 12: Scatterplots

Study Time and GPA

Residual Plot

A randomly scattered residual plot shows that a linear model is appropriate.

Page 13: Scatterplots

Study Time and GPA

Write the linear equation:

GPA = 1.8069326 + .4247748(Study Time)

Page 14: Scatterplots

Study Time and GPA

Interpret the Slope(b):

For every hour of study our model predicts an avg increase of .4247748319 in GPA.

Page 15: Scatterplots

Study Time and GPA

Interpret the y-intercept(a):

At 0 hours of study our model predicts a GPA of 1.8069326.

Page 16: Scatterplots

Study Time and GPA

Interpret the correlation(r):

There is a strong positive linear association between hours of study and GPA.

Page 17: Scatterplots

Study Time and GPA

Interpret the Coefficient of Determination(r2):

66.6% of the variation in GPA can be explained by the approximate linear relationship with hours of study.

Page 18: Scatterplots

Tootsie Pop GrabLAST YEAR

Page 19: Scatterplots

Tootsie Pop GrabHave you ever wondered how many tootsie pops you could grab in one hand?

LAST YEAR

Page 20: Scatterplots

Tootsie Pop GrabFirst we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?

LAST YEAR

Page 21: Scatterplots

Tootsie Pop Grab23 CM

LAST YEAR

Page 22: Scatterplots

Tootsie Pop GrabLAST YEAR

Page 23: Scatterplots

Tootsie Pop GrabLAST YEAR

Page 24: Scatterplots

Tootsie Pop GrabLAST YEAR

Page 25: Scatterplots

Tootsie Pop GrabLAST YEAR

Page 26: Scatterplots

Tootsie Pop Grab

Are there any outliers or influential points?

If this point was removed, the slope of the line would increase and the correlation would become stronger.

LAST YEAR

Page 27: Scatterplots

Tootsie Pop Grab

Predicted # of Pops = -12.9362 + 1.57199(Handspan)

LAST YEAR

Page 28: Scatterplots

Tootsie Pop Grab

For every………

Interpret the slope “b”LAST YEAR

Page 29: Scatterplots

Tootsie Pop Grab

For every cm of handspan our model predicts an avg increase of 1.57199322 in the # of pops you can grab.

Interpret the slope “b”LAST YEAR

Page 30: Scatterplots

Tootsie Pop Grab

If your handspan was 0 cm, ………

Interpret the y-intercept “a”LAST YEAR

Page 31: Scatterplots

Tootsie Pop Grab

If your handspan was 0 cm our model predicts -12.9361942 pops that can be grabbed.

Interpret the y-intercept “a”

Why is this not statistically significant?

This is not statistically significant because you cannot have a negative # of pops grabbed.

LAST YEAR

Page 32: Scatterplots

Tootsie Pop Grab

There is a , ………

Describe the association……this means interpret the correlation “r”

LAST YEAR

Page 33: Scatterplots

Tootsie Pop Grab

There is a moderate positive linear association between handspan and the # of pops you can grab.

Describe the association……this means interpret the correlation “r”

LAST YEAR

Page 34: Scatterplots

Tootsie Pop Grab

__% of the variation ………

Interpret the coefficient of determination “r2”

LAST YEAR

Page 35: Scatterplots

Tootsie Pop Grab

38.6% of the variation in pops grabbed can be explained by the approximate linear relationship with handspan.

Interpret the coefficient of determination “r2”

LAST YEAR

Page 36: Scatterplots

Scatterplot vs Residual Plot

The residual plot uses the same x-axis but the y-axis is the residuals.The residual plot shows the actual points. It shows whether they were above or below the prediction line.

LAST YEAR

Page 37: Scatterplots

Scatterplot vs Residual Plot

Prediction line

LAST YEAR

Page 38: Scatterplots

Tootsie Pop Grab

What was the predicted # of pops for a handspan of 24?

Predicted # of Pops = -12.9362 + 1.57199(Handspan)

LAST YEAR

Page 39: Scatterplots

Tootsie Pop Grab

What was the predicted # of pops for a handspan of 24?

Predicted # of Pops = -12.9362 + 1.57199(24)

24.79

LAST YEAR

Page 40: Scatterplots

Tootsie Pop Grab

What was the ACTUAL # of pops for a handspan of 24?

Check the residual plot for this.

It’s predicted +/- residual.

24.79 + 4 = 28.79

LAST YEAR

Page 41: Scatterplots

Skittles Bag GrabPeriod 2

Page 42: Scatterplots

Skittles Bag GrabHave you ever wondered how many skittles bags you could grab in one hand?

Period 1

Page 43: Scatterplots

Skittles Bag GrabFirst we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?

Period 2

Page 44: Scatterplots

Skittles Bag Grab23 CM

Period 2

Page 45: Scatterplots

Skittles Bag GrabPeriod 2

Page 46: Scatterplots

Predicted # of Skittles Bags = -16.639+ 1.74544(Handspan)

Skittles Bag GrabPeriod 2

Page 47: Scatterplots

For every………

Interpret the slope “b”

Period 2Skittles Bag Grab

Page 48: Scatterplots

Skittles Bag Grab

For every cm of handspan our model predicts an avg increase of 1.74544 in the # of skittles bags you can grab.

Interpret the slope “b”

Period 2

Page 49: Scatterplots

If your handspan was 0 cm, ………

Interpret the y-intercept “a”

Period 2Skittles Bag Grab

Page 50: Scatterplots

If your handspan was 0 cm our model predicts -16.639 in the # of skittles bags that can be grabbed.

Interpret the y-intercept “a”

Why is this not statistically significant?

This is not statistically significant because you cannot have a negative # of skittles bags grabbed.

Period 2Skittles Bag Grab

Page 51: Scatterplots

There is a , ………

Describe the association……this means interpret the correlation “r”

Period 2Skittles Bag Grab

Page 52: Scatterplots

There is a moderate positive linear association between handspan and the # skittles bags you can grab.

Describe the association……this means interpret the correlation “r”

Period 2Skittles Bag Grab

Page 53: Scatterplots

__% of the variation ………

Interpret the coefficient of determination “r2”

Period 2Skittles Bag Grab

Page 54: Scatterplots

42.7% of the variation in skittles bags grabbed can be explained by the approximate linear relationship with handspan.

Interpret the coefficient of determination “r2”

Period 2Skittles Bag Grab

Page 55: Scatterplots

Tootsie Pop GrabPeriod 3

Page 56: Scatterplots

Tootsie Pop GrabHave you ever wondered how many tootsie pops you could grab in one hand?

Period 3

Page 57: Scatterplots

Tootsie Pop GrabFirst we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?

Period 3

Page 58: Scatterplots

Tootsie Pop Grab23 CM

Period 3

Page 59: Scatterplots

Tootsie Pop GrabPeriod 3

Page 60: Scatterplots

Tootsie Pop GrabPeriod 3

Page 61: Scatterplots

Tootsie Pop Grab

Predicted # of Pops = -27.7801 + 2.34491(Handspan)

Period 3

Page 62: Scatterplots

Tootsie Pop Grab

For every………

Interpret the slope “b”

Period 3

Page 63: Scatterplots

Tootsie Pop Grab

For every cm of handspan our model predicts an avg increase of 2.34491 in the # of pops you can grab.

Interpret the slope “b”

Period 3

Page 64: Scatterplots

Tootsie Pop Grab

If your handspan was 0 cm, ………

Interpret the y-intercept “a”

Period 3

Page 65: Scatterplots

Tootsie Pop Grab

If your handspan was 0 cm our model predicts -27.7801 pops that can be grabbed.

Interpret the y-intercept “a”

Why is this not statistically significant?

This is not statistically significant because you cannot have a negative # of pops grabbed.

Period 3

Page 66: Scatterplots

Tootsie Pop Grab

There is a , ………

Describe the association……this means interpret the correlation “r”

Period 3

Page 67: Scatterplots

Tootsie Pop Grab

There is a moderate positive linear association between handspan and the # of pops you can grab.

Describe the association……this means interpret the correlation “r”

Period 3

Page 68: Scatterplots

Tootsie Pop Grab

__% of the variation ………

Interpret the coefficient of determination “r2”

Period 3

Page 69: Scatterplots

Tootsie Pop Grab

48.0% of the variation in pops grabbed can be explained by the approximate linear relationship with handspan.

Interpret the coefficient of determination “r2”

Period 3

Page 70: Scatterplots

Smarties GrabPeriod 4

Page 71: Scatterplots

Have you ever wondered how many smarties you could grab in one hand?

Smarties GrabPeriod 4

Page 72: Scatterplots

First we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?

Smarties GrabPeriod 4

Page 73: Scatterplots

23 CM

Smarties GrabPeriod 4

Page 74: Scatterplots

Smarties GrabPeriod 4

Page 75: Scatterplots

Predicted # of Smarties Packages = -10.3911+ 1.8359(Handspan)

Smarties GrabPeriod 4

Page 76: Scatterplots

For every………

Interpret the slope “b”

Smarties GrabPeriod 4

Page 77: Scatterplots

For every cm of handspan our model predicts an avg increase of 1.8359 in the # of smarties you can grab.

Interpret the slope “b”

Smarties GrabPeriod 4

Page 78: Scatterplots

If your handspan was 0 cm, ………

Interpret the y-intercept “a”

Smarties GrabPeriod 4

Page 79: Scatterplots

If your handspan was 0 cm our model predicts -10.3911 in the # of smarties that can be grabbed.

Interpret the y-intercept “a”

Why is this not statistically significant?

This is not statistically significant because you cannot have a negative # of smarties grabbed.

Smarties GrabPeriod 4

Page 80: Scatterplots

There is a , ………

Describe the association……this means interpret the correlation “r”

Smarties GrabPeriod 4

Page 81: Scatterplots

There is a moderate positive linear association between handspan and the # smarties you can grab.

Describe the association……this means interpret the correlation “r”

Smarties GrabPeriod 4

Page 82: Scatterplots

__% of the variation ………

Interpret the coefficient of determination “r2”

Smarties GrabPeriod 4

Page 83: Scatterplots

35.1% of the variation in smarties grabbed can be explained by the approximate linear relationship with handspan.

Interpret the coefficient of determination “r2”

Smarties GrabPeriod 4

Page 84: Scatterplots

Surfing

Page 85: Scatterplots

Below are 22 randomly selected days that Mr. Pines has surfed in the past few years.

Time(Min)

45 60 43 30 62 59 61 44 70 75 85

# of Waves

2 6 5 2 5 8 5 6 15 9 11

Time(Min) 90 58 47 31 63 64 73 42 65 57 66# of Waves

10 6 7 3 2 10 10 7 3 12 12

Is there an association between minutes surfed and # of waves ridden?

Page 86: Scatterplots

Create a Scatterplot of the data.

Minutes will be the Explanatory Variable x and # of Waves will be the response variable y.

Calculator: Minutes in L1 and Waves in L2

Page 87: Scatterplots

Find your linear model.Calculator: Stat, Calc, 8:LinReg(a + bx), L1,L2,Vars,Y-Vars,1:Function,Y1

If your r and r2 do not show up you need to go to catalog and turn Diagnostic On

Page 88: Scatterplots

Write your linear model properly.

Predicted # of Waves = -1.205 + .141811(minutes surfed)

DO NOT use X and Y, ALWAYS use the words in context.

Page 89: Scatterplots

Use your linear model to make a prediction.

Predicted # of Waves = -1.205 + .141811(49)

How many waves does your model predict if you surfed for 49 minutes?

Predicted # of Waves = 5.74

Page 90: Scatterplots

Beware of Extrapolation.

Predicted # of Waves = -1.205 + .141811(120)

How many waves does your model predict if you surfed for 120 minutes?

Predicted # of Waves = 15.81

Because 120 minutes is beyond our domain on the x-axis our answer cannot be trusted, this is called Extrapolation.

Page 91: Scatterplots

Interpret the y-intercept “a”

Surfing 0 minutes, our model predicts -1.205 waves ridden.

ALWAYS use context.

Page 92: Scatterplots

Interpret the slope “b”

For every minute surfed, our model predicts an average increase of .141481 in waves ridden.

ALWAYS use context.

Page 93: Scatterplots

Interpret the correlation coefficient “r”

There is a moderate positive linear association between minutes surfed and waves ridden.

ALWAYS use context.

Page 94: Scatterplots

Interpret the coefficient of determination “r2”

35.9% of the variation in waves ridden can be explained by the approximate linear relationship with minutes surfed.

ALWAYS use context.

Page 95: Scatterplots

Graph the Scatterplot Again.Now that you have had your calculator find your linear model, the LSRL should now show up on your scatterplot

Calculator: Zoom 9

Page 96: Scatterplots

Do a Residual PlotCalculator: In the List Menu(2nd Stat) find the name RESID and place in for Ylist.

Page 97: Scatterplots

What does the Residual Plot tell you?

The points on the residual plot are called residuals. They are the actual points and the horizontal axis is your LSRL.

Page 98: Scatterplots

What does the Residual Plot tell you?

If the residual plot shows a random scatter like this one, then the linear model is a good fit. If there is a curved pattern then a nonlinear model may be a better fit.(we will do these later in chapter 12.

Page 99: Scatterplots

Understanding Computer OutputThis is often given to you on the AP test so you don’t have to waste time putting #’s in your calculator

Page 100: Scatterplots

Make sure you know which one is “a” and “b”R is obtained from taking the square root of R-Sq

S is the standard deviation of the residuals(not used much in our class)

Seb is the standard deviation of the slope(more of this in CH15)T and P are also used in CH15

Page 101: Scatterplots

Too many people at my party

Is there an association between the # of people at a party and the # of fights that occur?

Page 102: Scatterplots

Influential Points and OutliersPoints that are extreme values in the x-direction may be influential points.

An influential point is a point that strongly affects the regression line if that point was removed.

Page 103: Scatterplots

Extreme points

in x-direction

in y-direction

Page 104: Scatterplots

Try removing this point

Notice the change in the regression line and the value of the slope.

That point is an influential point

Page 105: Scatterplots

Try removing this point

Notice how the slope and regression line do not change much.

That point is NOT an influential point it is an Outlier

Page 106: Scatterplots

Influential Points and OutliersPoints that are extreme values in the x-direction are called influential points.

Points that are extreme in the y-direction are called Outliers…..an outlier will have a large residual value Outlier

Influential Point

Page 107: Scatterplots

More Influential Points

Page 108: Scatterplots

More Influential Points

Page 109: Scatterplots

Using the Residual Plot

This is the scatterplot for # of people at a party and # of fights that occur.

Page 110: Scatterplots

Residual plot for people at party and fights

What is the predicted # of fights for having 8 people at a party?

Fights = 2.94738 + .1222(people)

Page 111: Scatterplots

Fights = 2.94738 + .1222(people)Fights = 2.94738 + .1222(8)Predicted # of fights is about 3.92

Page 112: Scatterplots

Residual plot for people at party and fights

What was the actual # of fights that occurred for having 8 people at a party?

Fights = 2.94738 + .1222(people)

Page 113: Scatterplots

The residual seems to be about 2 below the prediction line. So 3.92 – 2 = 1.92 actual fights.

Page 114: Scatterplots

You can see that the original point (8,2) matches our answer from the previous slide.

Page 115: Scatterplots

Correlation

Page 116: Scatterplots

Correlation

Page 117: Scatterplots

Correlation

Page 118: Scatterplots

Correlation

r = ± .70 to ± .99 Strong Correlation

r = ± .40 to ±.69 Moderate Correlation

r = ± .01 to ± .39 Weak Correlation

Page 119: Scatterplots

Scatterplot & Residual PlotSometimes you can spot a curved residual plot in the scatterplot

Page 120: Scatterplots

Slope and CorrelationThe slope and the correlation should be heading in the same direction

Page 121: Scatterplots

Ministers and Rum

Page 122: Scatterplots

Ministers and Rum

Explanatory Variable(X)# of Methodist Ministers

Response Variable(Y)Barrels of Cuban Rum

Page 123: Scatterplots

Ministers and Rum

Page 124: Scatterplots

Ministers and Rum1. Write the linear equation.

2. In your own words tell me what the meaning of the y-intercept is for this situation.

3. Make a prediction for the number of barrels of rum if there were 150 methodist ministers.

Predicted # of Barrels of Rum = 33.18073414 + 132.1220623(Ministers)

If there were no ministers we could expect about 33 barrels of rum

Plugging 150 into the linear equation predicts 19,851.5 barrels.

Page 125: Scatterplots

Ministers and Rum

6. The correlation is near perfect, what conclusions can be made here?

5. Describe the association between Ministers and Rum.

4. Make a prediction for the number of barrels of rum if there were 400 methodist ministers…..What would be your concerns with making this type of prediction?

Plugging 400 into the linear model predicts 52,882 barrels…….this is a concern because we are predicting beyond the domain of the x-axis…..this is called EXTRAPOLATION.

There is a strong positive linear association between the # of methodist ministers and the # of barrels of rum.

We cannot make conclusions or cause and effect. We can only SUGGEST an association.

Page 126: Scatterplots

Ministers and Rum7. Since it is not likely that the ministers were drinking the rum, what might be a lurking variable for this situation?

Population increase from 1860 to 1940 brings a demand for more ministers and more rum

Page 127: Scatterplots

Ministers and Rum8. What year created the largest residual in this situation?

1920

Page 128: Scatterplots

Correlation Does Not Imply Causation

Page 129: Scatterplots

Hand Span vs Foot Length

Hand Span(cm)

Foot

Len

gth

(in)

Page 130: Scatterplots

Hand Span vs Foot LengthPredicted Foot Length = 1.17366746 + .4165143(Hand Span)

Interpret a,b,r,r2

At a hand span of 0 cm our model predicts a foot size of about 1.17 inches

For every additional cm in hand span our model predicts and avg increase of about .417 inches in foot length.

There is a moderate positive linear relationship between hand span and foot length.

47.5% of the variation in foot length can be explained by the linear relationship with Hand Span.

R2 = .4750

Page 131: Scatterplots

Correlation Does Not Imply Causation

Page 132: Scatterplots

Correlation Does Not Imply Causation

Page 133: Scatterplots

Correlation Does Not Imply Causation

Page 134: Scatterplots

Correlation Does Not Imply Causation

Page 135: Scatterplots

Correlation Does Not Imply Causation

Page 136: Scatterplots

Correlation Does Not Imply Causation

Page 137: Scatterplots

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 2

Predicted Texts Received = 2.63407 + .905352(texts sent)

There is a strong positive linear association between the number of texts sent and received.

Page 138: Scatterplots

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 2

Page 139: Scatterplots

Is this linear model a good fit?

Period 2

Yes, a linear is a good fit because the RESIDUALS show a random scatter above and below the prediction line.

Don’t call them DOTS or IT or THEY

Page 140: Scatterplots

Insert residual plot

What is the residual for the student who sent 40 text messages?

Period 2

Predicted Texts Received = 2.63407 + .905352(texts sent)

-2

Page 141: Scatterplots

Insert residual plot

What was the actual number of text messages received for the student with 40 text messages sent?

Period 2

Predicted Texts Received = 2.63407 + .905352(texts sent)

37

Page 142: Scatterplots

Insert residual plot

What was the actual number of text messages received for the student with 0 text messages sent?

Period 2

Predicted Texts Received = 2.63407 + .905352(texts sent)

5

Page 143: Scatterplots

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 3

Predicted Texts Received = -1.2379 + 1.08031(texts sent)

There is a strong positive linear association between the number of texts sent and received.

Page 144: Scatterplots

Insert residual plotPeriod 3

Is this linear model a good fit?

Yes, a linear model is a good fit because the RESIDUALS show a random scatter above and below the prediction line.

Don’t call them DOTS or IT or THEY

Page 145: Scatterplots

Insert residual plot

What is the residual for the student who sent 35 text messages?

Period 3

Predicted Texts Received = -1.2379 + 1.08031(texts sent)

-4

Page 146: Scatterplots

Insert residual plot

What was the actual number of text messages received for the student with 35 text messages sent?

Period 3

Predicted Texts Received = -1.2379 + 1.08031(texts sent)

32

Page 147: Scatterplots

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 4

Predicted Texts Received = 1.79437 + 0.848007(texts sent)

There is a strong positive linear association between the number of texts sent and received.

Page 148: Scatterplots

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 4

Page 149: Scatterplots

Insert residual plot

Is this linear model a good fit?

Period 4

Yes, a linear is a good fit because the RESIDUALS show a random scatter above and below the prediction line.

Don’t call them DOTS or IT or THEY

Page 150: Scatterplots

Insert residual plot

What is the residual for the student who sent 15 text messages?

Period 4

Predicted Texts Received = 1.79437 + 0.848007(texts sent)

-8

Page 151: Scatterplots

Insert residual plot

What was the actual number of text messages received for the student with 15 text messages sent?

Period 4

Predicted Texts Received = 1.79437 + 0.848007(texts sent)

7

Page 152: Scatterplots

Things you need learn to do for CH3•Do a scatterplot on your calculator•Do a Residual plot on your calculator•Find the linear equation on your calculator•Write your linear equation in context•Make a prediction using your equation•Interpret the slope(b) in context•Interpret the y-intercept(a) in context•Interpret “r” in context•Interpret “r2” in context•Use the residual plot to determine if your model is a good fit.

Page 153: Scatterplots

•Correlation does not imply Causation…..this means that a scatterplot with a strong correlation does not necessarily mean that x leads to y. Only a well designed experiment can give cause and effect conclusions.

•Making predictions beyond the domain of the x-axis cannot be trusted, this is called Extrapolation

•On a scatterplot extreme values in the y-direction are called “outliers” and extreme values in the x-direction are called “influential points”

•R is the correlation coefficient, it measures the strength of the association between x and y

•R2 is the coefficient of determination which is the % of variation in y that is explained by approzimate linear association with x

Page 154: Scatterplots

•The sum and the mean of the residuals is always zero.

•The standard deviation of the residuals gives a measure of how the points in the scatterplot are spread around the regression line.

•The point is always on the regression line.

•The correlation “r” is not changed by adding the same number to every value of one of the variables, by multiplying every value of one of the variables by the same positive number, or by interchanging the x and y variables.

•The correlation “r” cannot be greater than 1 or less than -1.

Page 155: Scatterplots

•Correlation is strongly affected by outliers.

•The slope and the correlation have the same sign.

•Influential points are points who sharply affect the regression line. An influential point may have a small residual but have a large effect on the regression line.

•A residual plot that shows a curved pattern shows that your linear model may not be a good fit. A residual plot that is randomly scattered shows that your model may be a good fit.


Recommended