+ All Categories
Home > Documents > Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of...

Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of...

Date post: 16-Jan-2016
Category:
Upload: augusta-bryant
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
37
Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line – the least squares regression line. Will give us an indication of the type of association – positive or negative.
Transcript
Page 1: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Section 2.2 Correlation

A numerical measure to supplement the graph.

Will give us an indication of “how closely” the data points fit a particular line – the least squares regression line.

Will give us an indication of the type of association – positive or negative.

Page 2: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Warning:Notice the Scale!

Page 3: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Notes: 1. Data in the summation is standardized like a z-score (and thus is not affected by change in units).2. Divide scatterplot into quadrants based on centroid-Data in 1st and 3rd quadrants contribute positive values to r-Data in 2nd and 4th quadrants contribute negative values to r

Page 124

Page 4: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

How Data Relative to Centroid Affects r

Quadrant 1Quadrant 2

Quadrant 3 Quadrant 4

Page 5: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

1. Turn Diagnostics On: 2nd Catalog, scroll down to DiagnosticOn and press Enter (you do not have to repeat this step everytime!)2. Compute r (and a few other things!): Stat|Calc|LinReg(a+bx) press Enter and then give your lists: L1,L23. Your output should be: a=102.5, b=-3.62, r^2=0.8915, r=-0.9442

Student A B C D E F G

Number of Absences (L1) 6 2 15 9 12 5 8

Final Grade (L2) 82 86 43 74 58 90 78

Final Class Grade versus Number of Absences

40

50

60

70

80

90

100

0 5 10 15 20

X - Number of Absences

y -

Fin

al C

lass

Gra

de Data

Centroid

What are the meanings of these numbers??

Let’s start with r….

Let’s use our TI’s to find the correlation for our data set!

Page 6: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

• Symmetric in X and Y (makes no difference which variable is the explanatory and which is response)

• Both variables must be quantitative!• -1 <= r <= 1 ALWAYS• The closer in magnitude r is to 1, the stronger the

linear relationship between X and Y• The sign of r indicates whether there is a positive

or negative relationship between X and Y• Just like the mean and standard deviation, r is

strongly affected by outliers• See page 125 for more!

Properties of the Correlation Coefficient (r)

Page 7: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Getting a Feel for r

Let’s Play the Guessing Correlations Game!

http://www.stat.uiuc.edu/courses/stat100/java/GCApplet/GuessCGI.html

I will put this link on your assignments page!

Page 8: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Figure onPage 126

Page 9: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Section 2.3 Least-Squares Regression

We will first learn how to find the least-squares regression line and then understand how to interpret it.

Please enter the data in Example 2.9 on page 152 into L3 and L4 on your TI calculator. L3 is NEA Increase (cal) and L4 is Fat Gain (kg)

Page 10: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Scatterplot of Example 2.9 Data (page 133)r=?

Page 11: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Using LinReg(a+bx) L3,L4 we get the coefficients for the least-squares regression line:

a=3.505, b=-0.00344, r^2=0.6061, and r=-0.7786So we have the line:

L4 variable = a + b*(L3 variable)fat gain = 3.505 – 0.00334*(NEA increase)

To use line to predict fat gain for an NEA increase of 400 calories (Example 2.10 page 134) plug value of 400 into NEA increase.How does this look graphically???

Page 12: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

fat gain = 3.505 – 0.00334*(NEA increase)Slope = b = -0.00334Y-intercept = a = 3.505

aLeast-squaresregression line

Page 13: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Let’s Get the Equation of the least-squares regression line for our Absence and Final Grade Data (It should still be in L1 and L2):

LinReg(a+bx) L1,L2 gives: y=a+bx where:

a=102.49, b=-3.622, r^2=0.8915 and r=-0.9442

So the equation of the least-squares regression line is:

Final Grade = 102.49 – 3.622*(Number of Absences)

Use this model to predict the Final Grade for a student who has 10 absences.

Let’s look at this graphically….

Another Example

Page 14: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Final Class Grade versus Number of Absences

y = -3.6219x + 102.49

R2 = 0.8915

40

50

60

70

80

90

100

0 5 10 15 20

X - Number of Absences

y -

Fin

al C

lass

Gra

de Data

Centroid

Linear (Data)

Notice:•The least-squares regression line goes through the centroid.•We can graphically represent the prediction of the Final Class Grade for a given Number of Absences.•What is the meaning of b and r^2????

Page 15: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Caution! Using the Regression Line to Make PredictionsFor Certain Values of x

Page 16: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Interpretation of the Least-Squares Regression Line

Page136Error = Residual = Observed - Predicted

Page 17: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Makes sense becausethe line always passesthrough the centroid!

Page 18: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Interpretation of b, the Slope of the Regression Line (page 138)

• A change of one unit in x corresponds to a change of b units in y.

• A change of one standard deviation in x corresponds to a change of r standard deviations in y.

• Let’s find b via the formula on page 137 for our example data:

• How do we interpret b?• What are the units for b?

Page 19: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Two Sources of Variability in y: - Relationship between x and y via the regression line (r^2 tells %) - Variability for a fixed value of x

Page141, 142

Interpretation of r^2 (p141, 142)

Page 20: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Let’s use some list operations to verify r^2 for our example data set of Absences and Final Grade: -Regression Line:

Final Grade = 102.49 – 3.622*(Number of Absences)-Observed values of y are in L2-To get predicted values of y for each value of x (in L1):

102.49-3.622*L1L5 ( is the STO key)-To get the residuals (i.e. the Predicted – Observed):

L5-L2L6 (What is the meaning of the data in L6???)

Interpretation of r^2 (p141, 142)

r^2 = (Variance of Predicted Values)/(Variance of Observed Values)

Page 21: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Interpretation of r^2 (page 142)

r^2 = (Variance of Predicted Values)/(Variance of Observed Values)

= (standard dev. L5)^2/(standard dev. L2)^2 = (15.8472)^2/(16.7829)^2 = (15.8472/16.7829)^2 = 0.8916 (note we have some round-off error in the

4th decimal place)So regression line explains about 89% of the

variability in the values of y (a very strong result!)

Page 22: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Here r^2=0.606 so the regression model explains about 61% of the variability in y, i.e. about 61% of the vertical scatter in y.

Two Sources of Variability in y: - Relationship between x and y via the regression line (r^2 tells %) - Variability for a fixed value of x

Page141, 142

Page 23: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Section 2.4 - Cautions about Correlation and Regression

Error = Residual = Observed - Predicted

Example 2.15 (scatterplot with regression line page 152)

An Interesting Fact: The sum of the residuals about

the least-squares regression line is always zero.

Page 24: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

A residual plot (page 153) gives us a visual representation in the

leftover variance in the response variable after taking into account the regression. It helps us to assess how well the

line describes the data.

IF the regression line catches the overall pattern of the data there should be no pattern in

the residuals.

Page 25: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

(b) Negative Residual

(a) Positive Residual

The residual plot will

Page 26: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

A Residual Plot Note: No discernable patternto residuals

Page 27: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Meaurements of Pipe Defects

y = 0.7267x + 4.9433R2 = 0.8921

01020

30405060

708090

0 50 100

Laboratory Measurements

Fie

ld M

easu

rem

ents

Data

Centroid

Linear (Data)

Meaurements of Pipe Defects

01020

30405060

708090

0 50 100

Laboratory Measurements

Fie

ld M

easu

rem

ents

Data

Centroid

Example 2.4 (page 108) Revisited

Both the scatter plot and residual Plot show more variability in field

measurements as true (laboratory measured) defect size increases, despite strong correlation (r=0.9445) and large percent of variability in y explained by

regression (r^2=0.8921)

Page 28: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Beware of Outliers and Influential Points

Page 29: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Example 2.16 (page 154 – 157)Weakens Regression

Strengthens Regression

Data Point r with data point

r without data point

Subject 15 0.4819 0.5684

Subject 18 0.4819 0.3837

Page 30: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Beware the Lurking Variable (page 158) and Remember:Correlation does not imply Causation! (page 160)

Lurking variables can create “nonsense correlations” or possibly hide true relationships between x and y.

Page 31: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

A “nonsense” correlationLurking variable? Both variables increased during the time period

plotted. Thus the common year is a lurking variable.

Example 2.2 page 159

Page 32: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Example 2.21 page 160

Page 33: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Data Set A

y = 0.5001x + 3.0001R2 = 0.6665

4

6

8

10

12

0 5 10 15

x

y

Data Set B

y = 0.5x + 3.0009R2 = 0.6662

3

5

7

9

11

0 5 10 15

x

y

Data Set C

y = 0.4997x + 3.0025R2 = 0.6663

4

6

8

10

12

14

0 5 10 15

x

y

Data Set D

y = 0.4999x + 3.0017R2 = 0.6667

5

6

7

8

9

10

11

12

13

5 10 15 20

x

y

Problem 2.80 (page169) – Any Observations??????

Page 34: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Figure for Problem 2.81

Page 35: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Figure for Problem 2.83

Page 36: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Section 2.5 The Question of Causation

: Dashed double arrow line is an observed association.: Solid arrow from x to y shows “x causes y”

Page 37: Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.

Recommended