Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | sloane-mills |
View: | 15 times |
Download: | 0 times |
BPS - 3rd Ed. Chapter 5 2
To describe the change in Y per unit X
To predict the average level of Y at a given level of X
Objectives of Regression
BPS - 3rd Ed. Chapter 5 3
“Returning Birds” Example
Plot data first to see if relation can be described by straight line (important!)
Illustrative data from Exercise 4.4
Y = adult birds joining colony
X = percent of birds returning, prior year
BPS - 3rd Ed. Chapter 5 4
If data can be described by straight line
… describe relationship with equation
Y = (intercept) + (slope)(X) May also be written:
Y = (slope)(X) + (intercept)
Intercept where line crosses Y axis
Slope “angle” of line
BPS - 3rd Ed. Chapter 5 5
Linear Regression Algebraic line every point falls on line:
exact y = intercept + (slope)(X)
Statistical line scatter cloud suggests a linear trend:
“predicted y” = intercept + (slope)(X)
BPS - 3rd Ed. Chapter 5 6
Regression Equation
ŷ = a + bx, where – ŷ (“y-hat”) is the predicted value of Y– a is the intercept
– b is the slope
– x is a value for X
Determine a & b for “best fitting line”
The TI calculators reverse a & b!
BPS - 3rd Ed. Chapter 5 7
What Line Fits Best?
If we try to draw the line by eye, different people will draw different lines
We need a method to draw the “best line”
This method is called “least squares”
BPS - 3rd Ed. Chapter 5 8
The “least squares” regression lineEach point has:
Residual = observed y – predicted y
= distance of point from prediction line
The least squares line minimizes the sum of the square residuals
BPS - 3rd Ed. Chapter 5 9
Calculating Least Squares Regression Coefficients
Formula (next slide) Technology
– TI-30XIIS– Two variable Applet – Other
BPS - 3rd Ed. Chapter 5 10
xbya
s
srb
x
y
b = slope coefficient a = intercept coefficient
Formulas
where sx and sy are the standard deviations of the two variables, and r is their correlation
BPS - 3rd Ed. Chapter 5 11
Technology: Calculator
BEWARE!
TI calculators label the slope and intercept backwards!
BPS - 3rd Ed. Chapter 5 12
Regression Line
For the “bird data”: a = 31.9343 b = 0.3040
The linear regression equation is: ŷ = 31.9343 0.3040x
The slope (-0.3040) represents the average change in Y per unit X
BPS - 3rd Ed. Chapter 5 13
Use of Regression for Prediction
Suppose an individual colony has 60% returning (x = 60). What is the predicted number of new birds for this colony?
Answer: ŷ = a + bx = 31.9343 (0.3040)(60) = 13.69
Interpretation: the regression model predicts 13.69 new birds (ŷ) for a colony with x = 60.
BPS - 3rd Ed. Chapter 5 14
Prediction via Regression Line Number of new birds and Percent returning
When X = 60, the regression model predicts Y = 13.69
BPS - 3rd Ed. Chapter 5 15
Case Study
Per Capita Gross Domestic Productand Average Life Expectancy for
Countries in Western Europe
BPS - 3rd Ed. Chapter 5 16
Country Per Capita GDP (x) Life Expectancy (y)
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99
United Kingdom 21.2 77.37
Regression CalculationCase Study
BPS - 3rd Ed. Chapter 5 17
Life Expectancy and GDP (Europe)
Case Study (Life Expectancy)
76
77
78
79
18 19 20 21 22 23 24
Per Capital GDP
Lif
e ex
pec
tan
cy (
yrs)
BPS - 3rd Ed. Chapter 5 18
0.795 1.532
0.809 77.754 21.52
yx ss
ryx
Calculations:
68.716.52)(0.420)(21-77.754
0.4201.532
0.795(0.809)
xbya
s
srb
x
y
ŷ = 68.716 + 0.420x
Regression Calculationby Hand (Life Expectancy Study)
BPS - 3rd Ed. Chapter 5 24
InterpretationLife Expectancy Case Study
Model: ŷ = 68.716 + (0.420)X Slope: For each increase in GDP
0.420 years increase in life expectancy Prediction example: What is the life
expectancy in a country with a GDP of 20.0?ANSWER:ŷ = 68.716 + (0.420)(20.0) = 77.12
BPS - 3rd Ed. Chapter 5 25
Coefficient of Determination (R2)(Fact 4 on p. 111)
“Coefficient of determination, (R2)
Quantifies the fraction of the Y “mathematically explained” by X
Examples:
r=1: R2=1: regression line explains all (100%) ofthe variation in Y
r=.7: R2=.49: regression line explains almost half
(49%) of the variation in Y
BPS - 3rd Ed. Chapter 5 27
Outliers and Influential Points
An outlier is an observation that lies far from the regression line
Outliers in the y direction have large residuals
Outliers in the x direction are influential– removal of influential point would markedly
change the regression and correlation values
BPS - 3rd Ed. Chapter 5 28
Outliers:Case Study
Gesell Adaptive Score and Age at First Word
From all the data
r2 = 41%
r2 = 11%
After removing child 18
BPS - 3rd Ed. Chapter 5 29
CautionsAbout Correlation and Regression
Describe only linear relationships
Are influenced by outliers
Cannot be used to predict beyond the range of X (do not extrapolate)
Beware of lurking variables (variables other than X and Y)
– Association does not always equal causation!
BPS - 3rd Ed. Chapter 5 30
Do not extrapolate (Sarah’s height)
Sarah’s height is plotted against her age
Can you predict her height at age 42 months?
Can you predict her height at age 30 years (360 months)?
80
85
90
95
100
30 35 40 45 50 55 60 65
age (months)
hei
gh
t (c
m)
BPS - 3rd Ed. Chapter 5 31
Do not extrapolate (Sarah’s height)
Regression equation: ŷ = 71.95 + .383(X)
At age 42 months: ŷ = 71.95 + .383(42) = 88(Reasonable)
At age 360 months: ŷ = 71.95 + .383(360) = 209.8
(That’s over 17 feet tall!)
70
90
110
130
150
170
190
210
30 90 150 210 270 330 390
age (months)
hei
gh
t (c
m)