+ All Categories
Home > Documents > Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is...

Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is...

Date post: 19-Jan-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
15
9/15/15 1 Correlation and Regression STA 2300 Chapter 4 Response Variable Also called the dependent variable Measures an outcome of a study Is often a/the primary focus of the study Individual values are called responses
Transcript
Page 1: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

1

Correlation and Regression!STA 2300!Chapter 4!

Response Variable!n Also called the dependent variable!n Measures an outcome of a study!

n  Is often a/the primary focus of the study!n  Individual values are called responses!

Page 2: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

2

Explanatory Variable!n Also called the predictor or the independent variable!

n  Influences the response variable directly or indirectly!

n Helps explain variation in responses!n Can be used to predict responses!

Graphical Approach!

Process Inputs Outputs

Response Variable Explanatory

Variable

Page 3: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

3

Examples!n Quarterback’s salary for next season and

number of touchdowns thrown!n  Explanatory:! ! !Response:!

n Weight loss and amount of exercise!n  Explanatory:! ! !Response:!

n Years in job and income!n  Explanatory:! ! !Response:!

Scatterplots!Represent the association between two quantitative variables measured on the same individuals.

Person Income RentSally 4,000$ 925$ Austin 1,200$ 425$ Joe 600$ 635$ Ginny 2,000$ 600$ Ruby $0 525$

Monthly

$0

$200

$400

$600

$800

$1,000

$0 $2,000 $4,000

Income

Rent

Page 4: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

4

What to Look For in a Scatterplot!n Form of the overall pattern!n Direction of the overall pattern!n Strength of the Relationship!

Form of the Overall Pattern!n  Linear !n Curvature!n Deviations from

that pattern!

$0

$200

$400

$600

$800

$1,000

$0 $2,000 $4,000

Income

Rent

Use the rectangle method to assess whether a linear pattern

Page 5: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

5

Direction of the Overall Pattern!n  Positive: As one variable increases so does the other!n  Negative: As one variable increases the other

decreases!n  No relationship: None of the above!

$0

$200

$400

$600

$800

$1,000

$0 $2,000 $4,000

Income

Rent

Positive

$0

$500

$1,000

$1,500

$2,000

0 50 100 150

Distance from NYC

Taxe

s

Negative

8,0008,5009,0009,50010,00010,50011,000

0 50 100

Outside Temperature

DJI

A

No Relationship

Strength of the Relationship!n  For linear patterns, use the rectangle to observe

the length-to-width ratio!n  Large length-to-width ratio indicate strong

relationships; small, indicate weak relationships!

$0

$200

$400

$600

$800

$1,000

$0 $2,000 $4,000

Income

Rent

$0

$200

$400

$600

$800

$1,000

$0 $2,000 $4,000

Income

Rent

$0

$200

$400

$600

$800

$1,000

$0 $2,000 $4,000

Income

Rent

Weak Moderate Strong

Page 6: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

6

Adding Explanatory Variables!

Mass (kg) Rate (cal)36.1 99554.6 142548.5 1396

42 141850.6 1502

42 125640.3 118933.1 91342.4 112434.5 105251.1 134741.2 1204

Female

800100012001400160018002000

30 50 70

Mass

Rate

Adding Explanatory Variables!

Mass (kg) Rate (cal)36.1 99554.6 142548.5 1396

42 141850.6 1502

42 125640.3 118933.1 91342.4 112434.5 105251.1 134741.2 1204

Female

800100012001400160018002000

30 50 70

Mass

RateMass (kg) Rate (cal)

62 179262.9 166647.4 136248.7 161451.9 146051.9 186746.9 1439

Male

800100012001400160018002000

30 50 70

Mass

Rate

Page 7: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

7

Correlation — r!n  Measures the strength and direction of the

linear relationship.!n  Does NOT describe curved relationships, no matter

how strong they are.!n  Always a number between -1 and 1.!n  Strong linear relationships: r is close to -1 or 1.!n  Weak linear relationships:r is close to 0.!

n  Does NOT mean there is not some other relationship!

Correlation — r!n Requires both variables be quantitative.!n Has no units.!n Direction of linear relationship indicated

by sign of r:!n  Positive relationship: Positive r!n  Negative relationship: Negative r!

n Strongly affected by outliers.!

Page 8: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

8

Weak

Strong

Calculation of r!

r =1

n −1xi − x

sx

#

$ %

&

' (

yi − y sy

#

$ % %

&

' ( ( ∑

Note: This requires the mean and standard deviation for both variables.

n represents the number of individuals.

Page 9: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

9

Calculation of r: Example!

!!"

#$$%

& −

x

i

sxx

A Size(GB)

B C Price($)

D Product (BxC)

8 -0.6 310 -0.5 0.306 -0.8 290 -0.6 0.48

18 0.5 500 0.4 0.2030 1.7 800 1.9 3.2320 0.7 470 0.3 0.2110 -0.4 330 -0.4 0.163 -1.1 150 -1.2 1.32

!!"

#$$%

& −

x

i

sxx

!!"

#$$%

& −

y

i

syy

Mean 13.6Standard Deviation 9.5

407.1209.0

-0.6-0.80.51.70.7-0.4-1.1

-0.5-0.60.41.90.3-0.4-1.2

0.300.480.203.230.210.161.32

Sum = 5.90

r = 5.90/(7-1) = 0.98

More on r!n Correlation makes no distinction between

explanatory and response variables.!n  Labeling of x and y is in calculation is

arbitrary.!n Does not change when the units of

measurement of x and/or y are changed.!n  Since the standardized values of the

observations are used in the calculation.!

Page 10: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

10

Regression Line n A straight line

n  Form: y = b0 + b1x n Describes how the response y is affected

by changes in the explanatory variable, x n Often used to predict values of y from

values of x

Fitting the Regression Line n  Using the scatterplot, a line of best fit may be “eyeballed.”

n  The line provides a description of the association between two variables.

Person Income RentSally 4,000$ 925$ Austin 1,200$ 425$ Joe 600$ 635$ Ginny 2,000$ 600$ Ruby $0 525$

Monthly

$0

$200

$400

$600

$800

$1,000

$0 $2,000 $4,000

Income

Rent

Regression Line

Page 11: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

11

$0

$200

$400

$600

$800

$1,000

$0 $2,000 $4,000

Income

Rent

y-Intercept: 450 Change in y: 200

Change in x: 2000

“Eyeballing” the Line

y-Intercept: 450

Change in y: 200

Change in x: 2000

y = 450 + 0.1 x

“Eyeball” Estimate of the Equation

Slope: 200/2000 = 0.1

“Eyeballing” the Line

Page 12: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

12

Least-Squares Regression Line n  Mathematical method to determine the line of

best fit. n  Minimizes the sum of the squares of the vertical

distances between the points and the line.

X

Y

residual

Residuals n The difference between an observed

value of the response variable and the value predicted by the regression line.

n The sum of the residuals is 0.

êi = yi − yi

ˆ e ii∑ = 0

Page 13: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

13

Equation of the Least-Squares Regression Line n Equation notation:

Where is the predicted response for any x.

n Calculation of slope:

n Calculation of the intercept:

y = b0 + b1x

b1 = rsysx

b0 = y − bx

ˆ y

Example

x =1.75%y = 9.07% %35.15

%36.5=

=

y

x

ss 596.0=r

707.136.535.15596.0 =!"

#$%

&==x

y

ssrb

083.6)75.1(707.107.9 =−=−= xbya

Slope

Intercept

Equation xy 707.1083.6ˆ +=

Page 14: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

14

Important Facts n The distinction between explanatory and

response variables is essential. n The correlation and the slope of the line

always have the same sign. n The LS regression line always passes

through the point

x ,y ( )

Coefficient of Determination: r2 n The fraction of the variation in the values

of y that is explained by the LS regression of y on x.

35.5% in the variation in y can be explained by x.

r2 = (0.596)2 = .355 Example

Page 15: Correlation and Regression - math.ttu.edulellings/2300/Notes/Chapter4.pdf · Labeling of x and y is in calculation is arbitrary.!! Does not change when the units of measurement of

9/15/15

15

Cautions n  Correlation and regression only describe linear

relationships. n  r and the regression line are NOT resistant to

outliers. n  Do not use a regression line to predict far

outside the range of the observed explanatory variable, x. n  Called Extrapolation

n  Be aware of lurking variables n  Variables not included in the study, yet may influence

the interpretation of the relationship.

Lastly,

Correlation

Does NOT

Imply Causation


Recommended