Date post: | 31-Oct-2014 |
Category: |
Documents |
Upload: | mandrewmartin |
View: | 20 times |
Download: | 0 times |
Course NameMore Description About the Course
The World of Linear Regression
2
What is regression analysis?
Regression analysis is a technique for measuring the relationship between two interval- or ratio-level variables.
The regression framework is at the heart of empirical social and political science research.
Regression analysis acts as a statistical surrogate for controlled experiments, and can be used to make causal inferences.
3
Regression models
Researchers translate verbal theories, hypotheses, even hunches into models.
A model shows how and under what conditions two (or more) variables are related.
A regression model with a dependent variable and one independent variable is known as a bivariate regression model.
A regression model with a dependent variable and two or more independent variables and/or control variables is known as a multivariate regression model.
4
Scatterplots
A scatterplot graphs the sample observations by placing them along the X,Y axis.
The X axis generally represents the values of the independent variable, and the Y axis usually represents the value of the dependent variable.
X is the horizontal axis; Y is the vertical axis.
5
Scatterplots
Scatterplots allow you to study the flow of the dots, or the relationship between the two variables
Scatterplots allow political scientists to identify
-- positive or negative relationships -- monotonic or linear relationships
6
Scatterplot
7
8
Regression Equation
The linear equation is specified as follows:
Y = a + bX
Where Y = dependent variableX = independent variable
a = constant (value of Y when X = 0)b = is the slope of the regression line
9
Regression Equation
Y = a + bX
a can be positive or negative. In high school algebra, you may have referred to a as the intercept. This is because a is the point at which the slope line passes through the Y axis.
b (the slope coefficient) can be positive or negative. A positive coefficient denotes a positive relationship and a negative coefficient denotes a negative relationship.
The substantive interpretation of the slope coefficient depends on the variables involved, how they are coded and the scale of the variables. Larger coefficients may indicate a stronger relationship, but not necessarily.
10
The Regression Model
The goal of regression analysis is to find an equation which “best fits” the data.
In regression, an equation is found in such a way that its graph is a line that minimizes the squared vertical distances between the data points and the lines drawn.
11
d1 and d2 represent the distances of observed data points from an estimated regression line.
Regression analysis uses a mathematical procedure that finds the single line that minimizes the squared distances from the line.
12
Regression Equation
The standard regression equation is the same as the linear equation with one exception: the error term.
Y = α + βX + ε
Where Y = dependent variableα = constant term
β = slope or regression coefficientX = independent variable
ε = error term
13
Regression Equation
This regression procedure is known as ordinary least squares (OLS).
α (the constant term) is interpreted the same as before
β (the regression coefficient) tells how much Y changes if X changes by one unit.
The regression coefficient indicates the direction and strength of the relationship between the two quantitative variables.
14
Regression Equation
The error (ε) indicates that observed data do not follow a neat pattern that can be summarized with
a straight line.
A observation's score on Y can be broken into two parts:
α + βX is due to the independent variable
ε is due to error
Observed value = Predicted value (α + βX) + error (ε)
15
Regression Equation
The error is the difference between the predicted value of Y and the observed value of Y.
This difference is known as the residual.
16
17
18
Regression Interpretation
For the data on the scatterplot:
Y (depvar) = telephone lines for 1,000 peopleX (indvar) = Infant mortality
We can use regression analysis to examine the relationship between communication capacity (measured here as telephone lines per capita) and infant mortality.
19
Regression Interpretation
In this analysis, the intercept and regression coefficient are as follows:
α (or constant) = 121 Means that when X (infant deaths) is 0 deaths, there are 121 phone lines per 1,000 population.
β = -1.25
Means that when X (deaths) increases by 1, there is a predicted or estimated decrease of 1.25 phone lines.
20
Regression Interpretation
21
Regression Interpretation
These calculations can be useful because they allow you to make useful predictions about the data. An increase from 1 to 10 deaths per 1,000 live births is
associated with a decline of 119.75 – 108.5 = 11.25 telephone lines.
22
Interpreting the meaning of a coefficient can be tricky. What does a coefficient of -1.25 mean? -- Well, it means a negative relationship between infant mortality and phone lines.-- It means for every additional infant death there is a decrease of 1.25 phone lines.
This information is useful, but is there a measure that tells us how good a job we do predicting the observed values?
23
Scatterplot
24
R-squared
Yes, the measure is known as R-squared (or R2).
As stated earlier, there are two component parts of the total deviation from the mean, which is usually measured as the sum of squares (or total variance).
The difference between the mean and the predicted value of Y. This is the explained part of the deviation, or (Regression Sum of Squares).
The second component is the residual sum of squares (Residual Sum of Squares), which measures prediction errors. The is the unexplained part of the deviation.
25
R-squared
Total SS = Regression SS + Residual SS In other words, the total sum of squares is the sum of the regression sum of squares and the residual sum of squares.
R2 = Regression SS/TSS
The more variance the regression model explains, the higher the R2 .
26
27