Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | dale-bryan |
View: | 218 times |
Download: | 4 times |
Chapter 3Section 3.1
Examining Relationships
• Continue to ask the preliminary questions familiar from Chapter 1 and 2
• What individuals do the data describe?
• What are the variables?
• How are the variables measured?
• Are all the variables quantitative or is at least one a categorical variable?
• Do you want to explore the nature of the relationship or do you think some of the variables explain or cause the changes in others?
bivariate
involving two variables, especially, when attempting to show a correlation between two variables, the analysis is said to be bivariate
When working with bivariate data, each variable plays a different role.
One variable is the explanatory or predictor variable, while the other is the response variable.
Bivariate data is graphed on a scatterplot with an x-axis (horizontal) and y-axis (vertical).
The explanatory variable is graphed on the horizontal, and the response variable is graphed on the vertical.
A scatterplot is a picture of the association between two variables.
Do Problem 3.2 pg 123.
Tips for drawing scatterplots• Scale the horizontal and vertical
axes. The intervals must be uniform. If the scale does not begin at zero use the // symbol to indicate a break
• Label both axes• If given a grid, use a scale so that
your plot utilizes the whole grid. Don’t compress the plot into one corner of the grid.
To analyze a scatterplot, describe the data in terms of:
•Direction (positive or negative)•Form (linear, clustered, curve)•Scatter or strength ( recognize positive or negative association and linear patterns)•Outlier (deviation from the overall pattern)
Do the following problemsas example problems
3.6 pg. 1253.10 pg 129 3.22 pg 139
End of Section 3.1
CHAPTER 3 SECTION 3.2
Lesson 3.2 Correlation
Correlation is given by the following equation:
Correlation measures the direction and strength of the linear relationship between two quantitative variables. It is the average of the products of the standardized values.
y
i
x
i
s
yy
s
xx
nr
1
1
The correlation computed from the sample data measures the direction and strength of the linear relationship between two quantitative variables.
The symbol for the sample correlation coefficient is r.
The range of the correlation is from -1 to +1.
When r is close to +1, there is a strong positive linear relationship between the variables.
When r is close to -1, there is a strong negative relationship between the variables.
When there is no linear relationship or only a weak relationship, the value of r will be close to 0.
The correlation is not resistant. It is strongly affected by outliers.
If women always married men who were two years older than themselves, what would be the correlation between the ages of husband and wife?
The gas mileage of an automobile first increases and then decreases as the speed increases. This relationship is very regular as shown by the following data on speed (miles per hour) and the mileage (miles per gallon):
Speed: 20 30 40 50 60
MPG: 24 28 30 28 24
Make a scatter plot; calculate r.
End of Section 3.2
Section 3.3 Least Squares Regression
LEAST-SQUARES REGRESSION
Given a scatter plot, one must be able to draw the line of best fit. Best fit means that the sum of the squares of the vertical distances from each point to the line is minimized.
When the scatterplot appears linear, the line of best fit is the Least-Squares Regression Line (LSRL).
Equation of the Least-Squares Regression Line (LSRL)
bxay ˆ
is read “y-hat” and means the predicted value of y.
a is the y-intercept.
b is the slope.
is on the LSRL.
y
yx,
Equation for the slope of the LSRL:
x
y
s
srb
r is the correlation coefficient.
sx if the standard deviation of x.
sy is the standard deviation of y.
Equation for the y-intercept of the LSRL:
xbya
a is the y-intercept.
is the mean of the y-values.
is the mean of the x-values.
y
x
• y: observed value
• y bar: mean of observed values
• ŷ: predicted values
Do problem 3.38 pg 158refer back to data in FIG
3.1 pg 127
2 yySST
What is ? How do we interpret ?2r2r
If you know nothing about y’s relationship to x, when you want to predict y, the best you can do is use y-bar.
In this case,
TOTAL SUM OF SQUARED ERROR :
If you know something about the relationship between x and y, then
SUM OF SQUARES FOR ERROR:
2ˆ yySSE
Which is greater,
SST or SSE?
If x is a poor predictor of y, then the sum of square of deviation
about the mean y and the sum of square of deviation about the
regression line ŷ would be approximately the same
SSESST Is the amount of error you eliminated, and
SST
SSESST
is the proportion of error eliminated out of the total error you started with.
SST
SSESSTr
2
R2
The Coefficient of Determination
It is, also, known as the coefficient of variation.
The coefficient of determination, r2, is the fraction of the variation
in the values of y that is explained by the least-squares regression
of y on x.
When you report r, give r2 as a measure of how successful theregression was in explaining the response. When you see r, square itto get a better feel for the strength of the response.
Example: r = .7 , r2 = .49.
r2 = .49 means that 49% of the variation in y is explained by theleast squares regression of y on x.
The correlation between math and verbal SAT scores for this class was .66. What percent of the variation in the verbal scores is explained by the math scores?
In a study of the effect of temperature on household heating bills, an investigator said, “Our research shows that about 70% of the variability in the heating units used by a particular house over the years can be explained by outside temperature.” Explain what the investigator meant by this statement.
According to this study, what is the correlation between outside temperature and heating bills?
RESIDUALS• Residual =
• The mean of the least squares residuals always equals zero. (taking into account round-off error)
• An effective tool for testing the goodness of fit of a regression line to a bivariate data set is the residual plot.
Do problem 3.42
pg 167
RESIDUAL PLOT• The residual plot displays the
scatterplot of the points• If the residual plot shows a random
dispersion with no apparent pattern, the LSRL fits the data.
• If the residual plot shows a curved pattern or fanned pattern, the LSRL is not a good summary for the data
When the TI-83 executes a regression model, the residuals are automatically computed and stored in the list RESID. It will be located alphabetically in the NAMES list.
Do problem 3.48 which refers back to data in Table
3.4 and the equation in Example 3.14 pg 168.
TECHNOLOGY TOOLBOX
Analyzing Data for Two Variables
End of Chapter 3