Date post: | 18-Dec-2015 |
Category: |
Documents |
Upload: | karin-dean |
View: | 219 times |
Download: | 0 times |
July 1, 2008 Lecture 17 - Regression Testing 1
Testing Relationships between Variables
Statistics 111 - Lecture 17
July 1, 2008 Lecture 17 - Regression Testing 2
Administrative NotesAdministrative Notes
• Homework 5 due tomorrow
• Lecture on Wednesday will be review of entire course
July 1, 2008 Lecture 17 - Regression Testing 3
Final Exam
• Thursday from 10:40-12:10 • It’ll be right here in this room
• Calculators are definitely needed!• Single 8.5 x 11 cheat sheet (two-sided)
allowed• I’ve put a sample final on the course website
July 1, 2008 Lecture 17 - Regression Testing 4
Outline
• Review of Regression coefficients• Hypothesis Tests• Confidence Intervals• Examples
July 1, 2008 Lecture 17 - Regression Testing 5
Two Continuous Variables
• Visually summarize the relationship between two continuous variables with a scatterplot
• Numerically, we focus on best fit line (regression)
Education and Mortality Draft Order and Birthday
Mortality = 1353.16 - 37.62 · Education Draft Order = 224.9 - 0.226 · Birthday
June 30, 2008 Stat 111 - Lecture 16 - Regression
6
Best values for Regression Parameters
• The best fit line has these values for the regression coefficients:
• Also can estimate the average squared residual:
Best estimate of slope
Best estimate of intercept
July 1, 2008 Lecture 17 - Regression Testing 7
Significance of Regression Line
• Does the regression line show a significant linear relationship between the two variables?• If there is not a linear relationship, then we would
expect zero correlation (r = 0)• So the slope b should also be zero
• Therefore, our test for a significant relationship will focus on testing whether our slope is significantly different from zero
H0 : = 0 versus Ha : 0
June 30, 2008 Stat 111 - Lecture 16 - Regression
8
Linear Regression
• Best fit line is called Simple Linear Regression Model:
• Coefficients:is the intercept and is the slope
• Other common notation: 0 for intercept, 1 for slope
• Our Y variable is a linear function of the X variable but we allow for error (εi) in each prediction
• We approximate the error by using the residualObserved Yi
Predicted Yi = + Xi
June 30, 2008 Stat 111 - Lecture 16 - Regression
9
Test Statistic for Slope
• Our test statistic for the slope is similar in form to all the test statistics we have seen so far:
• The standard error of the slope SE(b) has a complicated formula that requires some matrix algebra to calculate• We will not be doing this calculation manually
because the JMP software does this calculation for us!
July 1, 2008 Lecture 17 - Regression Testing 11
Confidence Intervals for Coefficients
• JMP output also gives the information needed to make confidence intervals for slope and intercept
• 100·C % confidence interval for slope :
b +/- tn-2* SE(b)
• The multiple t* comes from a t distribution with n-2 degrees of freedom
• 100·C % confidence interval for intercept :
a +/- tn-2* SE(a)• Usually, we are less interested in intercept but it
might be needed in some situations
July 1, 2008 Lecture 17 - Regression Testing 12
Confidence Intervals for Example
• We have n = 60, so our multiple t* comes from a t distribution with d.f. = 58. For a 95% C.I., t* = 2.00
• 95 % confidence interval for slope :
-37.6 ± 2.0*8.307 = (-54.2,-21.0)Note that this interval does not contain zero!
• 95 % confidence interval for intercept :
1353± 2.0*91.42 = (1170,1536)
July 1, 2008 Lecture 17 - Regression Testing 13
Another Example: Draft Lottery• Is the negative linear association we see between
birthday and draft order statistically significant?
p-value
July 1, 2008 Lecture 17 - Regression Testing 14
Another Example: Draft Lottery
• p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order• Statistical evidence that the randomization was not done
properly!
• 95 % confidence interval for slope :
-.23±1.98*.05 = (-.33,-.13)• Multiple t* = 1.98 from t distribution with n-2 = 363 d.f.• Confidence interval does not contain zero, which we
expected from our hypothesis test
July 1, 2008 Lecture 17 - Regression Testing 15
• Dataset of 78 seventh-graders: relationship between IQ and GPA
• Clear positive association between IQ and grade point average
Education Example
July 1, 2008 Lecture 17 - Regression Testing 16
Education Example• Is the positive linear association we see between
GPA and IQ statistically significant?
p-value
July 1, 2008 Lecture 17 - Regression Testing 17
Education Example
• p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA
• 95 % confidence interval for slope :
.101±1.99*.014 = (.073,.129)
• Multiple t* = 1.99 from t distribution with n-2 = 76 d.f.• Confidence interval does not contain zero, which we
expected from our hypothesis test