+ All Categories
Home > Documents > Describing Bivariate Relationships

Describing Bivariate Relationships

Date post: 01-Jan-2016
Category:
Upload: akeem-powell
View: 55 times
Download: 2 times
Share this document with a friend
Description:
Describing Bivariate Relationships. Chapter 3 Summary YMS AP Stats. 3.1 Response Vs. Explanatory Variables. Response variable measures an outcome of a study, explanatory variable helps explain or influences changes in a response variable (like independent vs. dependent). - PowerPoint PPT Presentation
28
Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats
Transcript
Page 1: Describing Bivariate Relationships

Describing Bivariate Relationships

Describing Bivariate Relationships

Chapter 3 SummaryYMS

AP Stats

Chapter 3 SummaryYMS

AP Stats

Page 2: Describing Bivariate Relationships

3.1 Response Vs. Explanatory Variables

3.1 Response Vs. Explanatory Variables

• Response variable measures an outcome of a study, explanatory variable helps explain or influences changes in a response variable (like independent vs. dependent).

• Calling one variable explanatory and the other response doesn’t necessarily mean that changes in one CAUSE changes in the other.

• Ex: Alcohol and Body temp: One effect of Alcohol is a drop in body temp. To test this, researches give several amounts of alcohol to mice and measure each mouse’s body temp change. What are the explanatory and response variables?

Page 3: Describing Bivariate Relationships

ScatterplotsScatterplots

Page 4: Describing Bivariate Relationships

Examining ScatterplotsExamining

Scatterplots Overall pattern

• Direction

• Form

• Strength

• Outliers or deviations

Page 5: Describing Bivariate Relationships

Interpreting ScatterplotsInterpreting Scatterplots

• Direction: in previous example, the overall pattern moves from upper left to lower right. We call this a negative association.

• Form: The form is slightly curved and there are two distinct clusters. What explains the clusters? (ACT States)

• Strength: The strength is determined by how closely the points follow a clear form. The example is only moderately strong.

• Outliers: Do we see any deviations from the pattern? (Yes, West Virginia, where 20% of HS seniors take the SAT but the mean math score is only 511).

Page 6: Describing Bivariate Relationships

AssociationAssociation

Page 7: Describing Bivariate Relationships

Introducing Categorical Variables

Introducing Categorical Variables

Page 8: Describing Bivariate Relationships

Calculator ScatterplotCalculator Scatterplot

• Enter the Degree-Days in L1 and Gas in L2

• Next specify scatterplot in Statplot menu (first graph). X list L1 Y List L2 (explanatory and response)

• Use ZoomStat.

• Notice that their are no scales on the axes and they aren’t labeled. If you are copying your graph to your paper, make sure you scale and label the Axis (use Trace)

moth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Degree-Day 24 51 43 33 26 13 4 0 0 1 6 12 30 32 52 30Gas (100cuft)

6.3 10.9

8.9 7.5 5.3 4.0 1.7 1.2 1.2 1.2 2.1 3.1 6.4 7.2 11.0

6.9

Page 9: Describing Bivariate Relationships

Correlation rCorrelation r

• The Correlation measures the direction and strength of the linear relationship between 2 variables.

• Formula- (don’t need to memorize or use): r =

• In Calc: Go to Catalog (2nd, zero button), go to DiagnosticOn, enter, enter. You only have to do this ONCE! Once this is done:

• Enter data in L1 and L2 (you can do calc-2 var stats if you want the mean and sd of each)

• Calc, LinReg (A + Bx) enter

ZxZyn 1

Page 10: Describing Bivariate Relationships

Interpreting CorrelationInterpreting Correlation

• Caution- our eyes can be fooled! Our eyes are not good judges of how strong a linear relationship is. The 2 scatterplots depict the same data but drawn with a different scale. Because of this we need a numerical measure to supplement the graph.

Page 11: Describing Bivariate Relationships

Interpreting r Interpreting r • The absolute value of r tells you the strength of the

association (0 means no association, 1 is a strong association)

• The sign tells you whether it’s a positive or a negative association. So r ranges from -1 to +1

• Note- it makes no difference which variable you call x and which you call y when calculating correlation, but stay consistent!

• Because r uses standardized values of the observations, r does not change when we change the units of measurement of x, y, or both. (Ex: Measuring height in inches vs. ft. won’t change correlation with weight)

• values of -1 and +1 occur ONLY in the case of a perfect linear relationship , when the variables lie exactly along a straight line.

Page 12: Describing Bivariate Relationships

ExamplesExamples1. Correlation requires that both variables be quantitative

2. Correlation measures the strength of only LINEAR relationships, not curved...no matter how strong they are!

3. Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. Use r with caution when outliers appear in the scatterplot

4. Correlation is not a complete summary of two-variable data, even when the relationship is linear- always give the means and standard deviations of both x and y along with the correlation.

Page 13: Describing Bivariate Relationships

3.3- least squares regression

3.3- least squares regression

Text

The slope here B = .00344 tells us that fat gained goes down by .00344 kg for each added calorie of NEA according to this linear model. Our regression equation is the predicted RATE OF CHANGE in the response y as the explanatory variable x changes.

The Y intercept a = 3.505kg is the fat gain estimated by this model if NEA does not change when a person overeats.

Page 14: Describing Bivariate Relationships

PredictionPrediction

• We can use a regression line to predict the response y for a specific value of the explanatory variable x.

Page 15: Describing Bivariate Relationships

LSRL LSRL • In most cases, no line will pass exactly

through all the points in a scatter plot and different people will draw different regression lines by eye.

• Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatter plot

• A good regression line makes the vertical distances of the points from the line as small as possible

• Error: Observed response - predicted response

Page 16: Describing Bivariate Relationships

LSRL Cont. LSRL Cont.

Page 17: Describing Bivariate Relationships

Equation of LSRLEquation of LSRL

• Example: The Sanchez household is about to install solar panels to reduce the cost of heating their house. In order to know how much the panels help, they record their consumption of natural gas before the panels are installed. Gas consumption is higher in cold weather, so the relationship between outside temp and gas consumption is important.

Page 18: Describing Bivariate Relationships

Facts about Least-Squares regressionFacts about Least-Squares regression

• The distinction between explanatory and response variables is essential in regression. If we reverse the roles, we get a different least-squares regression line.

• There is a close connection between corelation and the slope of the LSRL. Slope is r times Sy/Sx. This says that a change of one standard deviation in x corresponds to a change of 4 standard deviations in y. When the variables are perfectly correlated (4 = +/- 1), the change in the predicted response y hat is the same (in standard deviation units) as the change in x.

• The LSRL will always pass through the point (X bar, Y Bar)

• r squared is the fraction of variation in values of y explained by the x variable

Page 19: Describing Bivariate Relationships
Page 20: Describing Bivariate Relationships

R squared- Coefficient of determination

R squared- Coefficient of determination

If all the points fall directly on the least-squares line, r squared = 1. Then all the variation in y is explained by the linear relationship with x.

So, if r squared = .606, that means that 61% of the variation in y among individual subjects is due to the influence of the other variable. The other 39% is “not explained”.

r squared is a measure of how successful the regression was in explaining the response

Page 21: Describing Bivariate Relationships

3.3 Influences3.3 Influences• Correlation r is not resistant. Extrapolation

is not very reliable. One unusual point in the scatterplot greatly affects the value of r. LSRL also not resistant.

• A point extreme in the x direction with no other points near it pulls the line toward itself. This point is influential.

Page 22: Describing Bivariate Relationships

Lurking Variables- Beware!

Lurking Variables- Beware!

• Example: A college board study of HS grads found a strong correlation between math minority students took in high school and their later success in college. News articles quoted the College Board saying that “math is the gatekeeper for success in college”.

• But, Minority students from middle-class homes with educated parents no doubt take more high school math courses. They are also more likely to have a stable family, parents who emphasize education, and can pay for college etc. These students would likely succeed in college even if they took fewer math courses. The family background of students is a lurking variable that probably explains much of the relationship between math courses and college success.

Page 23: Describing Bivariate Relationships

• The error of our predictions, or vertical distance from predicted Y to observed Y, are called residuals because they are “left-over” variation in the response.

ResidualsResiduals

One subject’s NEA rose by 135 calories. That subject gained 2.7 KG of fat. The predicted gain for 135 calories is

Y hat = 3.505- .00344(135) = 3.04 kg

The residual for this subject is

y - yhat= 2.7 - 3.04 = -.34 kg

Page 24: Describing Bivariate Relationships

Residual PlotResidual Plot

• The sum of the least-squares residuals is always zero.

• The mean of the residuals is always zero, the horizontal line at zero in the figure helps orient us. This “residual = 0” line corresponds to the regression line

Page 25: Describing Bivariate Relationships

Examining Residual PlotExamining Residual Plot• Residual plot should show no obvious pattern. A

curved pattern shows that the relationship is not linear and a straight line may not be the best model.

• Residuals should be relatively small in size. A regression line in a model that fits the data well should come close” to most of the points.

• A commonly used measure of this is the standard deviation of the residuals, given by:

s residuals

2n 2

For the NEA and fat gain data, S = 7.663

14.740

Page 26: Describing Bivariate Relationships

Residuals List on Calc

Residuals List on Calc

• If you want to get all your residuals listed in L3 highlight L3 (the name of the list, on the top) and go to 2nd- stat- RESID then hit enter and enter and the list that pops out is your resid for each individual in the corresponding L1 and L2. (if you were to create a normal scatter plot using this list as your y list, so x list: L1 and Y list L3 you would get the exact same thing as if you did a residual plot defining x list as L1 and Y list as RESID as we had been doing).

This is a helpful list to have to check your work when asked to calculate an individuals residual.

Page 27: Describing Bivariate Relationships

Residual Plot on CalcResidual Plot on Calc

• Produce Scatterplot and Regression line from data (lets use BAC if still in there)

• Turn all plots off

• Create new scatterplot with X list as your explanatory variable and Y list as residuals (2nd stat, resid)

• Zoom Stat

Page 28: Describing Bivariate Relationships

Bivariate RelationshipsBivariate RelationshipsWhat is Bivariate data?When exploring/describing a bivariate (x,y) relationship:

Determine the Explanatory and Response variablesPlot the data in a scatterplotNote the Strength, Direction, and FormNote the mean and standard deviation of x and the mean and standard deviation of yCalculate and Interpret the Correlation, rCalculate and Interpret the Least Squares Regression Line in context.Assess the appropriateness of the LSRL by constructing a Residual Plot.


Recommended