chapter 3 describing relationships...

chapter 3 describing relationships new.notebook

1

October 31, 2016

A response variable measures an outcome of study. > dependent variables

An explanatory variable attempts to explain the observed outcomes.

> independent variables

The response variable depends on the explanatory variable.

Example: We think that car weight helps explain accident deaths.

Explanatory variable: car weight

Response variable: accident death rate


2

October 31, 2016

A scatterplot is the most effective way to display the relationship between two quantitative variables measured on the same individuals. • Values of one variable appear on the horizontal axis and

values of the other variable appear on the vertical axis. • Each individual in the data appears as a point in the graph.• Always plot the explanatory variable (if there is one) on the

horizontal axis (x-axis). If there is no explanatory-response distinction, either variable can go on the horizontal axis.


3

October 31, 2016


4

October 31, 2016


5

October 31, 2016

Examining a ScatterplotIn any graph of data, look for the overall pattern and for striking deviations from that pattern. • You can describe the overall pattern of a scatterplot by the form, direction, and strength of the relationship.

• An important kind of deviation is an outlier, an individual that falls outside of the overall pattern of the relationship.

Form: shape of scatterplot


6

October 31, 2016

Interpret the scatterplot to the right.

Direction: Decreases from left to right. The higer percentage of people taking the SAT, the lower the mean math score was. There is a negative association. Form: The relationship is slightly curved. Clusters/gaps In about half the states, less than 25% took the SAT, and the other half more than 40% took it.

Strength: Moderately strong. States with similar percentage of people taking the SAT tend to have similar mean math scores.

Outliers: There appears to be two outliers: (20, 500) and (88, 460).


7

October 31, 2016

Describe what the scatterplot reveals about the relationship between body weight and backpack weight. (Direction, Form, Strength, Outliers)

*Hint: First describe the general pattern. Then identify any deviations from the pattern.


8

October 31, 2016

Positive Association, Negative AssociationTwo variables are positively associated when aboveaverage values of one tend to accompany aboveaverage values of the other. And below average values also tend to occur together.

Examples:

Positive Association: Backpack weight generally increases as body weight increases

Negative Association: The mean SAT score goes down as the percent of graduates taking the test increases.


9

October 31, 2016

Thursday Oct. 20th


10

October 31, 2016

Tuesday October 27th1. You have data for many years on the average price of a barrel of oil and the average retail price of a gallon of gas. If you want to see how well the price of oil predicts the price of gas, then you should make a scatterplot with _______ as the explanatory variable.

a) the price of oilb) the price of gasc) the year d) either oil price or gas pricee) time

2. A study was designed to determine if smoking influences life expectancy. What will the explanatory and response variables in this study be?


11

October 31, 2016

1. Describe the direction of the relationship. Explain why this makes sense.

2. What form does the relationship take? Why are there two clusters of points?


12

October 31, 2016

1. Describe the direction of the relationship. Explain why this makes sense.

Positive Association. The longer the duration, the longer the interval.

2. What form does the relationship take? Why are there two clusters of points?

Roughly linear. There are two clusters around 2 and 4.5, Most eruptions fall into two categories shorter (around 2 minutes) and longer (around 4.5 minutes).

3. How strong is the relationships? justify your answer.Fairly strong. The points don't deviate from a linear form too much.

4. Are there any outliers? There are a couple that could be but for the most part they are all in the overall pattern.

5. What information does the family need to predict when the next eruption will occur?

The duration of the previous eruption.


13

October 31, 2016

The two scatterplots above show the same data set using two different scales. Since it's easy to be fooled by different scales or amount of space around points in a scatterplot, we need a numerical measure to supplement the graph.


14

October 31, 2016

CorrelationThe correlation (r) measures the direction and strength of the linear relationship between two quantitative variables


15

October 31, 2016


Suppose that we have data on variables x and y for n individuals. The values for the first individual are x1 and y1, the values for the second individual are x2 and y2. The mean and standard deviations of the two variables are and for the x values, and and for the y values. The correlation between x and y is:


16

October 31, 2016


n = sample size

summation: "add these terms for all individuals"

mean of x valuesmean of y values

standard deviation of xvalues

standard deviation of yvalues

the x and y values for the term.


17

October 31, 2016


18

October 31, 2016

Interpreting Correlation1. r is always a number between 1 and 1. r > 0 indicates a positive

association and r < 0 indicates a negative association. r values near 0 indicate a very weak linear relationship. r = 1 and r = 1 only occur in the case of a perfect linear relationship where all points lie exactly on the line.

2. Since r uses the standardized values of the observations, r does not change when we change units of measurements of x, y, or both.

3. Correlation makes no distinction between explanatory and response variables. (Doesn't matter which variable you call x, which you call y)

4. Correlation, r, has no unit of measurement.5. Correlation does not describe curved relationships between

variables, only linear relationships. A correlation of 0 doesn't guarantee that there's no relationship, just that there's no linear relationship.

6. Correlation is not resistant: r is strongly affected by a few outlying observations.

7. Correlation is not a complete summary of twovariable data.


19

October 31, 2016


20

October 31, 2016

Wednesday October 28th

1. The following scatter plot shows reading test scores against IQ test scores for 14 fifth grade students. There is one outlier in the plot, what are the scores for that child? 2. In a scatterplot of the average price of a barrel of oil and the average retail price of a gallon of gas, you expect to see...


21

October 31, 2016


22

October 31, 2016

Least Squares Regression is a method for finding a line that summarizes the relationship between two variables.

• A regression line is a straight line that describes how a response variable (y) changes as an explanatory variable (x) changes.

• A regression line is often used to predict the value of y for a given x value. Regression, unlike correlation, requires that you have an explanatory and a response variable.

• A regression line is a model for the data


23

October 31, 2016

(yhat): the predicted value of the response variable y for a given value of the explanatory variable x.

the yintercept, the predicted value of y when x=0.

The slope, the amount by which y is predicted to change when x increases by one unit.


24

October 31, 2016

Everyone knows that cars and trucks lose value the more they are driven. Can we predict the pice of a used Ford F150 SuperCrew 4x4 if we know how many miles it has on the odometer? A random sample of 16 used F150s was selected from among those listed for sale at autotrader.com. The number of miles driven and price (in dollars) were recorded for each of the trucks, here's the data:


25

October 31, 2016

Example 1: Identify the slope and yintercept from the regression line and interpret each value in the context.


26

October 31, 2016

Example 1: Identify the slope and yintercept from the regression line and interpret each value in the context.


27

October 31, 2016

Back to the Ford F150 problem...

Example 1: How much would a Ford F150 be worth if it has 100,000 miles on it?


28

October 31, 2016

Example 2: How much would a Ford F150 be worth if it has 300,000 miles on it?


29

October 31, 2016

Monday October 24th

The distribution of scores on the Chapter 2 Test are as follows:

89, 88, 79, 89, 58, 84, 95, 79, 93, 92, 91, 94, 70, 93, 92, 87, 91, 73, 50, 91

What measure of center and spread would you choose to describe the data?

Which is higher, the median or the mean?

Graph the data and describe the distribution (SOCS).


30

October 31, 2016


31

October 31, 2016

Example 3: Find and interpret the residual for the Ford F150 that had 70,583 miles driven and a price of $21,994?


32

October 31, 2016

The least squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.


33

October 31, 2016

Facts about Residual PlotsA residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess whether a linear model is appropriate. • The mean of the least squares residuals is always zero. • A residual plot in effect turns the regression line horizontal. It magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns.

• If the regression line captures the overall pattern of the data, there should be no pattern in the residuals.


34

October 31, 2016

Examining a Residual Plot1. The residual plot should show no

obvious patterns. Ideally it would look like the plot to the right.

2. A curved pattern in a residual plot shows that the relationship is NOT linear.

3. The residuals should be relatively small in size.

4. Increasing or decreasing spread bout the line as x increases indicates that a prediction of y will be less accurate for largers x values.

5. Individual points with large residuals are outliers because they lie far from the line that describes the overall pattern.

6. Individual points that are extreme in the direction of x may not have large residuals, but can be important.


35

October 31, 2016

An outlier is an observation that lies outside the overall pattern of the other observations.

An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least squares regression line.


36

October 31, 2016

Tuesday October 25thSome data was collected on the weight of a male lab rat for the first 25 weeks after its birth. A scatterplot of the weight (in grams) and time since birth (in weeks) shows a fairly strong, positive linear relationship. The linear regression equation models the data fairly well:

1. What is the slope of the regression line? Explain what it means in context.

2. What is the y‐intercept? Explain what it means in context.

3. Predict the rat's weight after 16 weeks. Show your work.

4. Should you use the line to predict the rat's weight at age 2 years?


37

October 31, 2016


38

October 31, 2016


39

October 31, 2016


40

October 31, 2016

Standard Deviation of the Residuals

The average prediction error (or the mean of the residuals) is 0 whenever we use the least squares regression line. That's because the positive and negative residuals "balance out". But that doesn't tell us how far off the predictions are, on average.

So, we can say that our predictions are "off" by an average of _____.

This value gives the approximate size of a "typical" or "average" prediction error (residual)


41

October 31, 2016

The coefficient of determination: (or "rsq")The coefficient of determiniation, is the fraction of the variation in the values of y that is accounted for by the least squares regression line of y on x. (tells us how well the least squares regression line predicts values of the response variable y)

We can calcluate using the following formula:measures the total variation in the yvalues.

is the sum of the squared errors

The ratio tells us what proportion of the total variation in y still remains after using the regression line the predict the values of the response variable.

*The least squares regression line accounts for _____ % of the variation in [response variable name].


42

October 31, 2016


43

October 31, 2016

slope=1.109; For every 1 mpg in the city, the hwy mpg is predicted to increase by 1.109 mpg.

yint=4.62; when the city mileage is zero, we predict a hwy mileage of 4.62 mpg.


44

October 31, 2016

Here's a residual plot for the least squares regression of pack weight on body weight for the 8 hikers.


45

October 31, 2016

Tuesday November 3rdCreate a residual plot of the F150 data


46

October 31, 2016


47

October 31, 2016

1. Calculate the standard deviation of the residuals for the F150 problem. Interpret what it means in the context.

2. Calculate the coefficient of determination and interpret what it means in the context.

Monday Oct. 31st Refer to pg. 165 for data table


48

October 31, 2016

We can give the equation of the leastsquares regression line in terms of the means and standard deviations of the two variables and their correlation.

where and

We know that every least squares regression line passes through the point .


49

October 31, 2016


50

October 31, 2016

With all data:

Excluding Child 18:

Excluding Child 19:

6. What do you notice?


51

October 31, 2016

Removing child 18 has a strong influence on the position of the regression line. However, removing child 19 has little effect on the regression line.

A point that is extreme in the x direction with no other points near it pulls the line toward itself. We call these points influential.


52

October 31, 2016

Recall: The coefficient of determination:

How to interpret: "The least squares regression line accounts for _____ % of the variation in [response variable name]."

Standard Deviation of the Residuals: This value gives the approximate size of a "typical" or "average" prediction error (residual)

How to interpret: "Our predictions are "off" by an average of ______ [response variable name]."


53

October 31, 2016

Bottom Line:

Association does NOT imply causation!

Date post:	25-Aug-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

chapter 3 describing relationships...

Documents