Scatterplots

Post on 23-Feb-2016

30 views 0 download

Tags:

description

Scatterplots. Scatterplots. Scatterplots. Scatterplots. Scatterplots. Study Time and GPA. Study Time and GPA. Study Time and GPA. Residual Plot. A randomly scattered residual plot shows that a linear model is appropriate. Study Time and GPA. Write the linear equation:. - PowerPoint PPT Presentation

transcript

Ch3 Bivariate Data

ScatterplotsRegression

Scatterplots

Scatterplots

Scatterplots

Scatterplots

Scatterplots

L1 L2

Study Time and GPA

Study Time and GPA

Do a Residual PlotCalculator: In the List Menu(2nd Stat) find the name RESID and place in for Ylist.

Study Time and GPA

Residual Plot

A randomly scattered residual plot shows that a linear model is appropriate.

Study Time and GPA

Write the linear equation:

GPA = 1.8069326 + .4247748(Study Time)

Study Time and GPA

Interpret the Slope(b):

For every hour of study our model predicts an avg increase of .4247748319 in GPA.

Study Time and GPA

Interpret the y-intercept(a):

At 0 hours of study our model predicts a GPA of 1.8069326.

Study Time and GPA

Interpret the correlation(r):

There is a strong positive linear association between hours of study and GPA.

Study Time and GPA

Interpret the Coefficient of Determination(r2):

66.6% of the variation in GPA can be explained by the approximate linear relationship with hours of study.

Tootsie Pop GrabLAST YEAR

Tootsie Pop GrabHave you ever wondered how many tootsie pops you could grab in one hand?

LAST YEAR

Tootsie Pop GrabFirst we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?

LAST YEAR

Tootsie Pop Grab23 CM

LAST YEAR

Tootsie Pop GrabLAST YEAR

Tootsie Pop GrabLAST YEAR

Tootsie Pop GrabLAST YEAR

Tootsie Pop GrabLAST YEAR

Tootsie Pop Grab

Are there any outliers or influential points?

If this point was removed, the slope of the line would increase and the correlation would become stronger.

LAST YEAR

Tootsie Pop Grab

Predicted # of Pops = -12.9362 + 1.57199(Handspan)

LAST YEAR

Tootsie Pop Grab

For every………

Interpret the slope “b”LAST YEAR

Tootsie Pop Grab

For every cm of handspan our model predicts an avg increase of 1.57199322 in the # of pops you can grab.

Interpret the slope “b”LAST YEAR

Tootsie Pop Grab

If your handspan was 0 cm, ………

Interpret the y-intercept “a”LAST YEAR

Tootsie Pop Grab

If your handspan was 0 cm our model predicts -12.9361942 pops that can be grabbed.

Interpret the y-intercept “a”

Why is this not statistically significant?

This is not statistically significant because you cannot have a negative # of pops grabbed.

LAST YEAR

Tootsie Pop Grab

There is a , ………

Describe the association……this means interpret the correlation “r”

LAST YEAR

Tootsie Pop Grab

There is a moderate positive linear association between handspan and the # of pops you can grab.

Describe the association……this means interpret the correlation “r”

LAST YEAR

Tootsie Pop Grab

__% of the variation ………

Interpret the coefficient of determination “r2”

LAST YEAR

Tootsie Pop Grab

38.6% of the variation in pops grabbed can be explained by the approximate linear relationship with handspan.

Interpret the coefficient of determination “r2”

LAST YEAR

Scatterplot vs Residual Plot

The residual plot uses the same x-axis but the y-axis is the residuals.The residual plot shows the actual points. It shows whether they were above or below the prediction line.

LAST YEAR

Scatterplot vs Residual Plot

Prediction line

LAST YEAR

Tootsie Pop Grab

What was the predicted # of pops for a handspan of 24?

Predicted # of Pops = -12.9362 + 1.57199(Handspan)

LAST YEAR

Tootsie Pop Grab

What was the predicted # of pops for a handspan of 24?

Predicted # of Pops = -12.9362 + 1.57199(24)

24.79

LAST YEAR

Tootsie Pop Grab

What was the ACTUAL # of pops for a handspan of 24?

Check the residual plot for this.

It’s predicted +/- residual.

24.79 + 4 = 28.79

LAST YEAR

Skittles Bag GrabPeriod 2

Skittles Bag GrabHave you ever wondered how many skittles bags you could grab in one hand?

Period 1

Skittles Bag GrabFirst we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?

Period 2

Skittles Bag Grab23 CM

Period 2

Skittles Bag GrabPeriod 2

Predicted # of Skittles Bags = -16.639+ 1.74544(Handspan)

Skittles Bag GrabPeriod 2

For every………

Interpret the slope “b”

Period 2Skittles Bag Grab

Skittles Bag Grab

For every cm of handspan our model predicts an avg increase of 1.74544 in the # of skittles bags you can grab.

Interpret the slope “b”

Period 2

If your handspan was 0 cm, ………

Interpret the y-intercept “a”

Period 2Skittles Bag Grab

If your handspan was 0 cm our model predicts -16.639 in the # of skittles bags that can be grabbed.

Interpret the y-intercept “a”

Why is this not statistically significant?

This is not statistically significant because you cannot have a negative # of skittles bags grabbed.

Period 2Skittles Bag Grab

There is a , ………

Describe the association……this means interpret the correlation “r”

Period 2Skittles Bag Grab

There is a moderate positive linear association between handspan and the # skittles bags you can grab.

Describe the association……this means interpret the correlation “r”

Period 2Skittles Bag Grab

__% of the variation ………

Interpret the coefficient of determination “r2”

Period 2Skittles Bag Grab

42.7% of the variation in skittles bags grabbed can be explained by the approximate linear relationship with handspan.

Interpret the coefficient of determination “r2”

Period 2Skittles Bag Grab

Tootsie Pop GrabPeriod 3

Tootsie Pop GrabHave you ever wondered how many tootsie pops you could grab in one hand?

Period 3

Tootsie Pop GrabFirst we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?

Period 3

Tootsie Pop Grab23 CM

Period 3

Tootsie Pop GrabPeriod 3

Tootsie Pop GrabPeriod 3

Tootsie Pop Grab

Predicted # of Pops = -27.7801 + 2.34491(Handspan)

Period 3

Tootsie Pop Grab

For every………

Interpret the slope “b”

Period 3

Tootsie Pop Grab

For every cm of handspan our model predicts an avg increase of 2.34491 in the # of pops you can grab.

Interpret the slope “b”

Period 3

Tootsie Pop Grab

If your handspan was 0 cm, ………

Interpret the y-intercept “a”

Period 3

Tootsie Pop Grab

If your handspan was 0 cm our model predicts -27.7801 pops that can be grabbed.

Interpret the y-intercept “a”

Why is this not statistically significant?

This is not statistically significant because you cannot have a negative # of pops grabbed.

Period 3

Tootsie Pop Grab

There is a , ………

Describe the association……this means interpret the correlation “r”

Period 3

Tootsie Pop Grab

There is a moderate positive linear association between handspan and the # of pops you can grab.

Describe the association……this means interpret the correlation “r”

Period 3

Tootsie Pop Grab

__% of the variation ………

Interpret the coefficient of determination “r2”

Period 3

Tootsie Pop Grab

48.0% of the variation in pops grabbed can be explained by the approximate linear relationship with handspan.

Interpret the coefficient of determination “r2”

Period 3

Smarties GrabPeriod 4

Have you ever wondered how many smarties you could grab in one hand?

Smarties GrabPeriod 4

First we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?

Smarties GrabPeriod 4

23 CM

Smarties GrabPeriod 4

Smarties GrabPeriod 4

Predicted # of Smarties Packages = -10.3911+ 1.8359(Handspan)

Smarties GrabPeriod 4

For every………

Interpret the slope “b”

Smarties GrabPeriod 4

For every cm of handspan our model predicts an avg increase of 1.8359 in the # of smarties you can grab.

Interpret the slope “b”

Smarties GrabPeriod 4

If your handspan was 0 cm, ………

Interpret the y-intercept “a”

Smarties GrabPeriod 4

If your handspan was 0 cm our model predicts -10.3911 in the # of smarties that can be grabbed.

Interpret the y-intercept “a”

Why is this not statistically significant?

This is not statistically significant because you cannot have a negative # of smarties grabbed.

Smarties GrabPeriod 4

There is a , ………

Describe the association……this means interpret the correlation “r”

Smarties GrabPeriod 4

There is a moderate positive linear association between handspan and the # smarties you can grab.

Describe the association……this means interpret the correlation “r”

Smarties GrabPeriod 4

__% of the variation ………

Interpret the coefficient of determination “r2”

Smarties GrabPeriod 4

35.1% of the variation in smarties grabbed can be explained by the approximate linear relationship with handspan.

Interpret the coefficient of determination “r2”

Smarties GrabPeriod 4

Surfing

Below are 22 randomly selected days that Mr. Pines has surfed in the past few years.

Time(Min)

45 60 43 30 62 59 61 44 70 75 85

# of Waves

2 6 5 2 5 8 5 6 15 9 11

Time(Min) 90 58 47 31 63 64 73 42 65 57 66# of Waves

10 6 7 3 2 10 10 7 3 12 12

Is there an association between minutes surfed and # of waves ridden?

Create a Scatterplot of the data.

Minutes will be the Explanatory Variable x and # of Waves will be the response variable y.

Calculator: Minutes in L1 and Waves in L2

Find your linear model.Calculator: Stat, Calc, 8:LinReg(a + bx), L1,L2,Vars,Y-Vars,1:Function,Y1

If your r and r2 do not show up you need to go to catalog and turn Diagnostic On

Write your linear model properly.

Predicted # of Waves = -1.205 + .141811(minutes surfed)

DO NOT use X and Y, ALWAYS use the words in context.

Use your linear model to make a prediction.

Predicted # of Waves = -1.205 + .141811(49)

How many waves does your model predict if you surfed for 49 minutes?

Predicted # of Waves = 5.74

Beware of Extrapolation.

Predicted # of Waves = -1.205 + .141811(120)

How many waves does your model predict if you surfed for 120 minutes?

Predicted # of Waves = 15.81

Because 120 minutes is beyond our domain on the x-axis our answer cannot be trusted, this is called Extrapolation.

Interpret the y-intercept “a”

Surfing 0 minutes, our model predicts -1.205 waves ridden.

ALWAYS use context.

Interpret the slope “b”

For every minute surfed, our model predicts an average increase of .141481 in waves ridden.

ALWAYS use context.

Interpret the correlation coefficient “r”

There is a moderate positive linear association between minutes surfed and waves ridden.

ALWAYS use context.

Interpret the coefficient of determination “r2”

35.9% of the variation in waves ridden can be explained by the approximate linear relationship with minutes surfed.

ALWAYS use context.

Graph the Scatterplot Again.Now that you have had your calculator find your linear model, the LSRL should now show up on your scatterplot

Calculator: Zoom 9

Do a Residual PlotCalculator: In the List Menu(2nd Stat) find the name RESID and place in for Ylist.

What does the Residual Plot tell you?

The points on the residual plot are called residuals. They are the actual points and the horizontal axis is your LSRL.

What does the Residual Plot tell you?

If the residual plot shows a random scatter like this one, then the linear model is a good fit. If there is a curved pattern then a nonlinear model may be a better fit.(we will do these later in chapter 12.

Understanding Computer OutputThis is often given to you on the AP test so you don’t have to waste time putting #’s in your calculator

Make sure you know which one is “a” and “b”R is obtained from taking the square root of R-Sq

S is the standard deviation of the residuals(not used much in our class)

Seb is the standard deviation of the slope(more of this in CH15)T and P are also used in CH15

Too many people at my party

Is there an association between the # of people at a party and the # of fights that occur?

Influential Points and OutliersPoints that are extreme values in the x-direction may be influential points.

An influential point is a point that strongly affects the regression line if that point was removed.

Extreme points

in x-direction

in y-direction

Try removing this point

Notice the change in the regression line and the value of the slope.

That point is an influential point

Try removing this point

Notice how the slope and regression line do not change much.

That point is NOT an influential point it is an Outlier

Influential Points and OutliersPoints that are extreme values in the x-direction are called influential points.

Points that are extreme in the y-direction are called Outliers…..an outlier will have a large residual value Outlier

Influential Point

More Influential Points

More Influential Points

Using the Residual Plot

This is the scatterplot for # of people at a party and # of fights that occur.

Residual plot for people at party and fights

What is the predicted # of fights for having 8 people at a party?

Fights = 2.94738 + .1222(people)

Fights = 2.94738 + .1222(people)Fights = 2.94738 + .1222(8)Predicted # of fights is about 3.92

Residual plot for people at party and fights

What was the actual # of fights that occurred for having 8 people at a party?

Fights = 2.94738 + .1222(people)

The residual seems to be about 2 below the prediction line. So 3.92 – 2 = 1.92 actual fights.

You can see that the original point (8,2) matches our answer from the previous slide.

Correlation

Correlation

Correlation

Correlation

r = ± .70 to ± .99 Strong Correlation

r = ± .40 to ±.69 Moderate Correlation

r = ± .01 to ± .39 Weak Correlation

Scatterplot & Residual PlotSometimes you can spot a curved residual plot in the scatterplot

Slope and CorrelationThe slope and the correlation should be heading in the same direction

Ministers and Rum

Ministers and Rum

Explanatory Variable(X)# of Methodist Ministers

Response Variable(Y)Barrels of Cuban Rum

Ministers and Rum

Ministers and Rum1. Write the linear equation.

2. In your own words tell me what the meaning of the y-intercept is for this situation.

3. Make a prediction for the number of barrels of rum if there were 150 methodist ministers.

Predicted # of Barrels of Rum = 33.18073414 + 132.1220623(Ministers)

If there were no ministers we could expect about 33 barrels of rum

Plugging 150 into the linear equation predicts 19,851.5 barrels.

Ministers and Rum

6. The correlation is near perfect, what conclusions can be made here?

5. Describe the association between Ministers and Rum.

4. Make a prediction for the number of barrels of rum if there were 400 methodist ministers…..What would be your concerns with making this type of prediction?

Plugging 400 into the linear model predicts 52,882 barrels…….this is a concern because we are predicting beyond the domain of the x-axis…..this is called EXTRAPOLATION.

There is a strong positive linear association between the # of methodist ministers and the # of barrels of rum.

We cannot make conclusions or cause and effect. We can only SUGGEST an association.

Ministers and Rum7. Since it is not likely that the ministers were drinking the rum, what might be a lurking variable for this situation?

Population increase from 1860 to 1940 brings a demand for more ministers and more rum

Ministers and Rum8. What year created the largest residual in this situation?

1920

Correlation Does Not Imply Causation

Hand Span vs Foot Length

Hand Span(cm)

Foot

Len

gth

(in)

Hand Span vs Foot LengthPredicted Foot Length = 1.17366746 + .4165143(Hand Span)

Interpret a,b,r,r2

At a hand span of 0 cm our model predicts a foot size of about 1.17 inches

For every additional cm in hand span our model predicts and avg increase of about .417 inches in foot length.

There is a moderate positive linear relationship between hand span and foot length.

47.5% of the variation in foot length can be explained by the linear relationship with Hand Span.

R2 = .4750

Correlation Does Not Imply Causation

Correlation Does Not Imply Causation

Correlation Does Not Imply Causation

Correlation Does Not Imply Causation

Correlation Does Not Imply Causation

Correlation Does Not Imply Causation

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 2

Predicted Texts Received = 2.63407 + .905352(texts sent)

There is a strong positive linear association between the number of texts sent and received.

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 2

Is this linear model a good fit?

Period 2

Yes, a linear is a good fit because the RESIDUALS show a random scatter above and below the prediction line.

Don’t call them DOTS or IT or THEY

Insert residual plot

What is the residual for the student who sent 40 text messages?

Period 2

Predicted Texts Received = 2.63407 + .905352(texts sent)

-2

Insert residual plot

What was the actual number of text messages received for the student with 40 text messages sent?

Period 2

Predicted Texts Received = 2.63407 + .905352(texts sent)

37

Insert residual plot

What was the actual number of text messages received for the student with 0 text messages sent?

Period 2

Predicted Texts Received = 2.63407 + .905352(texts sent)

5

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 3

Predicted Texts Received = -1.2379 + 1.08031(texts sent)

There is a strong positive linear association between the number of texts sent and received.

Insert residual plotPeriod 3

Is this linear model a good fit?

Yes, a linear model is a good fit because the RESIDUALS show a random scatter above and below the prediction line.

Don’t call them DOTS or IT or THEY

Insert residual plot

What is the residual for the student who sent 35 text messages?

Period 3

Predicted Texts Received = -1.2379 + 1.08031(texts sent)

-4

Insert residual plot

What was the actual number of text messages received for the student with 35 text messages sent?

Period 3

Predicted Texts Received = -1.2379 + 1.08031(texts sent)

32

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 4

Predicted Texts Received = 1.79437 + 0.848007(texts sent)

There is a strong positive linear association between the number of texts sent and received.

Text Messaging

18 Students text messages(sent and received) for the past 24 hours per were recorded.

Is there a linear relationship between sending and receiving messages?

Period 4

Insert residual plot

Is this linear model a good fit?

Period 4

Yes, a linear is a good fit because the RESIDUALS show a random scatter above and below the prediction line.

Don’t call them DOTS or IT or THEY

Insert residual plot

What is the residual for the student who sent 15 text messages?

Period 4

Predicted Texts Received = 1.79437 + 0.848007(texts sent)

-8

Insert residual plot

What was the actual number of text messages received for the student with 15 text messages sent?

Period 4

Predicted Texts Received = 1.79437 + 0.848007(texts sent)

7

Things you need learn to do for CH3•Do a scatterplot on your calculator•Do a Residual plot on your calculator•Find the linear equation on your calculator•Write your linear equation in context•Make a prediction using your equation•Interpret the slope(b) in context•Interpret the y-intercept(a) in context•Interpret “r” in context•Interpret “r2” in context•Use the residual plot to determine if your model is a good fit.

•Correlation does not imply Causation…..this means that a scatterplot with a strong correlation does not necessarily mean that x leads to y. Only a well designed experiment can give cause and effect conclusions.

•Making predictions beyond the domain of the x-axis cannot be trusted, this is called Extrapolation

•On a scatterplot extreme values in the y-direction are called “outliers” and extreme values in the x-direction are called “influential points”

•R is the correlation coefficient, it measures the strength of the association between x and y

•R2 is the coefficient of determination which is the % of variation in y that is explained by approzimate linear association with x

•The sum and the mean of the residuals is always zero.

•The standard deviation of the residuals gives a measure of how the points in the scatterplot are spread around the regression line.

•The point is always on the regression line.

•The correlation “r” is not changed by adding the same number to every value of one of the variables, by multiplying every value of one of the variables by the same positive number, or by interchanging the x and y variables.

•The correlation “r” cannot be greater than 1 or less than -1.

•Correlation is strongly affected by outliers.

•The slope and the correlation have the same sign.

•Influential points are points who sharply affect the regression line. An influential point may have a small residual but have a large effect on the regression line.

•A residual plot that shows a curved pattern shows that your linear model may not be a good fit. A residual plot that is randomly scattered shows that your model may be a good fit.