AP Statistics

Post on 22-Feb-2016

41 views 0 download

Tags:

description

AP Statistics. 4.2 Cautions about Correlation and Regression. Learning Objective:. Understand Causation Differentiate between causation, common response, and confounding variables. Correlation and regression describe the relationship between two variables, but they have limitations: - PowerPoint PPT Presentation

transcript

AP Statistics4.2 Cautions about Correlation and

Regression

Understand Causation

Differentiate between causation, common response, and confounding variables

Learning Objective:

Correlation and regression describe the relationship between two variables, but they have limitations:

Correlation and regression describe only linear relationships.

The correlation and least-squares regression line are not resistant. (one influential observation or incorrectly entered data point can greatly change these measures)

Extrapolation- making predictions outside our

domain of values

Lurking Variables- when the relationship between 2

variables are affected by outside variables.

Other things to keep in mind.

Ex 1: Studies show that men who complain of chest pain are more likely to get detailed tests and aggressive treatment such as bypass surgery than are women with similar complaints. Is this association between gender and treatment due to discrimination?

Ex 2: A study of housing conditions in the city, measured a large number of variables for each of the wards in the city. Two of the variables were a measure of x of overcrowding and a measure y of the lack of indoor toilets. Because x and y are both measures of inadequate housing, we expect a high correlation. In fact the correlation was only r=0.08. How can this be?

Ex1: The math department of a university must plan the number of sections of elementary courses. We want to see if we can predict this from the number of 1st year students, which is already known.

 Year 1993 1994199519961997199819992000

X 45954827442742583995433042654351Y 73647547709968946572715672327450

(x=the # of first year students; y= the number of students who enroll in elementary classes)

Why would we have reservations about using this data to make prediction

Many regression or correlation studies work with averages or other measures that combine information from many individuals.

***Do not apply the results of such studies to individuals.

Ex: Relationship between outside temperature and natural gas

consumption.

Using averaged data

Ex: A study shows a positive correlation between the size of a hospital(measured by its number of beds x) and the median number of days y that a patient remains in the hospital. Does this mean that you can shorten a hospital stay by choosing a small hospital?

In studies of the relationship between two variables, the goal is to show that changes in the response variable are caused by changes in the explanatory variable.

Even when there is a strong association, the conclusion that this is due to a causal link between the two variables is often elusive.

The Question of Causation

1- common response- an outside variable that affects both x and y.

2- confounding-

z is a confounding variable. We don’t know if the change in y was due to x or if it was because of z .

Change in x, causes a change in y

Explaining association: causation

Explaining association: common response

Explaining association: confounding

The # of times you brush your teeth is a confounding variable. We don’t know if the number of cavities you have is because you ate a lot of apples or because you brushed your teeth a lot.

How do we explain this?

How can a direct causal link between x and y be established?

Do an experiment (experiments control lurking variables)

Establishing causation