+ All Categories
Home > Documents > Chapter 5

Chapter 5

Date post: 31-Dec-2015
Category:
Upload: sloane-mills
View: 15 times
Download: 0 times
Share this document with a friend
Description:
Chapter 5. Regression. Objectives of Regression. To describe the change in Y per unit X To predict the average level of Y at a given level of X. “Returning Birds” Example. Plot data first to see if relation can be described by straight line (important!) - PowerPoint PPT Presentation
Popular Tags:
32
BPS - 3rd Ed . Chapter 5 1 Chapter 5 Regression
Transcript

BPS - 3rd Ed. Chapter 5 1

Chapter 5

Regression

BPS - 3rd Ed. Chapter 5 2

To describe the change in Y per unit X

To predict the average level of Y at a given level of X

Objectives of Regression

BPS - 3rd Ed. Chapter 5 3

“Returning Birds” Example

Plot data first to see if relation can be described by straight line (important!)

Illustrative data from Exercise 4.4

Y = adult birds joining colony

X = percent of birds returning, prior year

BPS - 3rd Ed. Chapter 5 4

If data can be described by straight line

… describe relationship with equation

Y = (intercept) + (slope)(X) May also be written:

Y = (slope)(X) + (intercept)

Intercept where line crosses Y axis

Slope “angle” of line

BPS - 3rd Ed. Chapter 5 5

Linear Regression Algebraic line every point falls on line:

exact y = intercept + (slope)(X)

Statistical line scatter cloud suggests a linear trend:

“predicted y” = intercept + (slope)(X)

BPS - 3rd Ed. Chapter 5 6

Regression Equation

ŷ = a + bx, where – ŷ (“y-hat”) is the predicted value of Y– a is the intercept

– b is the slope

– x is a value for X

Determine a & b for “best fitting line”

The TI calculators reverse a & b!

BPS - 3rd Ed. Chapter 5 7

What Line Fits Best?

If we try to draw the line by eye, different people will draw different lines

We need a method to draw the “best line”

This method is called “least squares”

BPS - 3rd Ed. Chapter 5 8

The “least squares” regression lineEach point has:

Residual = observed y – predicted y

= distance of point from prediction line

The least squares line minimizes the sum of the square residuals

BPS - 3rd Ed. Chapter 5 9

Calculating Least Squares Regression Coefficients

Formula (next slide) Technology

– TI-30XIIS– Two variable Applet – Other

BPS - 3rd Ed. Chapter 5 10

xbya

s

srb

x

y

b = slope coefficient a = intercept coefficient

Formulas

where sx and sy are the standard deviations of the two variables, and r is their correlation

BPS - 3rd Ed. Chapter 5 11

Technology: Calculator

BEWARE!

TI calculators label the slope and intercept backwards!

BPS - 3rd Ed. Chapter 5 12

Regression Line

For the “bird data”: a = 31.9343 b = 0.3040

The linear regression equation is: ŷ = 31.9343 0.3040x

The slope (-0.3040) represents the average change in Y per unit X

BPS - 3rd Ed. Chapter 5 13

Use of Regression for Prediction

Suppose an individual colony has 60% returning (x = 60). What is the predicted number of new birds for this colony?

Answer: ŷ = a + bx = 31.9343 (0.3040)(60) = 13.69

Interpretation: the regression model predicts 13.69 new birds (ŷ) for a colony with x = 60.

BPS - 3rd Ed. Chapter 5 14

Prediction via Regression Line Number of new birds and Percent returning

When X = 60, the regression model predicts Y = 13.69

BPS - 3rd Ed. Chapter 5 15

Case Study

Per Capita Gross Domestic Productand Average Life Expectancy for

Countries in Western Europe

BPS - 3rd Ed. Chapter 5 16

Country Per Capita GDP (x) Life Expectancy (y)

Austria 21.4 77.48

Belgium 23.2 77.53

Finland 20.0 77.32

France 22.7 78.63

Germany 20.8 77.17

Ireland 18.6 76.39

Italy 21.5 78.51

Netherlands 22.0 78.15

Switzerland 23.8 78.99

United Kingdom 21.2 77.37

Regression CalculationCase Study

BPS - 3rd Ed. Chapter 5 17

Life Expectancy and GDP (Europe)

Case Study (Life Expectancy)

76

77

78

79

18 19 20 21 22 23 24

Per Capital GDP

Lif

e ex

pec

tan

cy (

yrs)

BPS - 3rd Ed. Chapter 5 18

0.795 1.532

0.809 77.754 21.52

yx ss

ryx

Calculations:

68.716.52)(0.420)(21-77.754

0.4201.532

0.795(0.809)

xbya

s

srb

x

y

ŷ = 68.716 + 0.420x

Regression Calculationby Hand (Life Expectancy Study)

BPS - 3rd Ed. Chapter 5 19

BPS/3e Two Variable Applet

BPS - 3rd Ed. Chapter 5 20

Applet: Data Entry

BPS - 3rd Ed. Chapter 5 21

Applet: Calculations

BPS - 3rd Ed. Chapter 5 22

Applet: Scatterplot

BPS - 3rd Ed. Chapter 5 23

Applet: least squares line

BPS - 3rd Ed. Chapter 5 24

InterpretationLife Expectancy Case Study

Model: ŷ = 68.716 + (0.420)X Slope: For each increase in GDP

0.420 years increase in life expectancy Prediction example: What is the life

expectancy in a country with a GDP of 20.0?ANSWER:ŷ = 68.716 + (0.420)(20.0) = 77.12

BPS - 3rd Ed. Chapter 5 25

Coefficient of Determination (R2)(Fact 4 on p. 111)

“Coefficient of determination, (R2)

Quantifies the fraction of the Y “mathematically explained” by X

Examples:

r=1: R2=1: regression line explains all (100%) ofthe variation in Y

r=.7: R2=.49: regression line explains almost half

(49%) of the variation in Y

BPS - 3rd Ed. Chapter 5 26

We are NOT going to cover the analysis of residual plots (pp. 113-116)

BPS - 3rd Ed. Chapter 5 27

Outliers and Influential Points

An outlier is an observation that lies far from the regression line

Outliers in the y direction have large residuals

Outliers in the x direction are influential– removal of influential point would markedly

change the regression and correlation values

BPS - 3rd Ed. Chapter 5 28

Outliers:Case Study

Gesell Adaptive Score and Age at First Word

From all the data

r2 = 41%

r2 = 11%

After removing child 18

BPS - 3rd Ed. Chapter 5 29

CautionsAbout Correlation and Regression

Describe only linear relationships

Are influenced by outliers

Cannot be used to predict beyond the range of X (do not extrapolate)

Beware of lurking variables (variables other than X and Y)

– Association does not always equal causation!

BPS - 3rd Ed. Chapter 5 30

Do not extrapolate (Sarah’s height)

Sarah’s height is plotted against her age

Can you predict her height at age 42 months?

Can you predict her height at age 30 years (360 months)?

80

85

90

95

100

30 35 40 45 50 55 60 65

age (months)

hei

gh

t (c

m)

BPS - 3rd Ed. Chapter 5 31

Do not extrapolate (Sarah’s height)

Regression equation: ŷ = 71.95 + .383(X)

At age 42 months: ŷ = 71.95 + .383(42) = 88(Reasonable)

At age 360 months: ŷ = 71.95 + .383(360) = 209.8

(That’s over 17 feet tall!)

70

90

110

130

150

170

190

210

30 90 150 210 270 330 390

age (months)

hei

gh

t (c

m)

BPS - 3rd Ed. Chapter 5 32

Even very strong correlations may not correspond to a causal

relationship between x and y

(Beware of the lurking variable!)

Caution: Correlation does not always mean causation


Recommended