Date post: | 21-Jan-2016 |
Category: |
Documents |
Upload: | ethelbert-ball |
View: | 226 times |
Download: | 0 times |
URBDP 591 A Lecture 9: Cross-Sectional and Longitudinal Design
Objectives• Experimental vs. Non-Experimental Design• Cross-Sectional Designs• Longitudinal Design• Multiple Regression
Research Designs/Approaches
Type Purpose Time frame Degree of control
Examples
Experi-mental
Test for cause/
effect relationships
current High Comparing two types of treatments
on plant growth
Quasi-experi-mental
Test for cause/
effect relationships without full control
Current or past
Moderate to high
Comparing effect of curriculum on children ability to read
Research Designs/Approaches
Type Purpose Time frame Degree of control
Examples
Non-experimental - corre-lational
Examine relationship between two variables
Current (cross-sectional) or past
Low to medium
Relationship between patterns of urban development on bird diversity
Ex post facto
Examine the effect of past event on current functioning.
Past & current
Low to medium
Relationship between change in population density and bird diversity
Research Designs/Approaches
Type Purpose Time frame Degree of control
Examples
Non-experimental -longitudinal
Repeated measurements of the same subject over time
Future -predictive
Low to moderate
Relat. betw. Urban development and stream quality
Cohort-sequential
Examine change in a var. over time in overlapping groups.
Future Low to moderate
Relationship between urban development and stream quality across various types of basins
Research Designs/Approaches
Type Purpose Time frame Degree of control
Examples
Survey Assess opinions or characteristics that exist at a given time.
Current None or low
People preferences for different landscapes.
Quali-tative
Discover potential relationships; descriptive.
Past or current
None or Low
People’s experiences of driving through a park.
• experimental research determines whether one variable causes changes in another variable
• correlational research measures the relationship between two variables
• difference: variables can be related without being causally related
Experimental vs Correlational Research
Main interest is to determine whether 2 variables co-vary and to determine direction of relationship. Characteristics of Correlational research.
- Differs from experimental research: 1. No manipulation of IV's 2. Subjects not randomly assigned.
- Measure 2 variables and determine whether correlational relationship exists between them.
- If correlational relationship exists between 2 variables, can predict value of one variable from value of another
Correlational Research
Correlational Studies• Type of descriptive research design
– Advantage is that it can examine variables that cannot be experimentally manipulated (e.g.,population growth).
– Disadvantage is that it cannot determine causality.
– Third variable may account for the association.
– Directionality unclear
Non-experimental Research Designs
• Describes a particular situation or phenomenon.
• Hypothesis generating
• Can describe effect of implementing actions based on experimental research and help refine the implementation of these actions.
Cross-Sectional Study Designs
• Compares groups at one point in time– e.g., landscape patterns.
• Advantage is that it is an efficient way to identify possible group differences because you can study them at one point in time.
Disadvantage is that you cannot rule out cohort effects.
Longitudinal method--measurement of the same
subjects over time.
Cross-sectional method--measurement of several
groups at a single point in time.
Sequential methods--methods that combine the
cross-sectional and longitudinal methods
Non-Experimental Research Design
Longitudinal Design
• Gathers data on a factor (e.g. bird diversity) over time.• Advantage is that you can see the time course of the
development or change in the variables – Bird diversity decreasing with urbanization.– Bird diversity decreasing at a faster rate within the
UGB.
Disadvantage is it is costly and still subject to bias
Cohort-Sequential Design
• Combines a bit of the cross-sectional design and longitudinal design– E.g., Different bird species are compared on
a variable over time.• Advantage – very efficient and reduces some of
the biases in the cross-sectional design since you can see the evolution of change over time.
Disadvantage – cannot rule out cohort bias or the problem of the ‘unidentified’ third variable accounting for the change.
• correlation refers to a meaningful relationship between two variables; values of both variables change together somehow
• positive correlation: high score on first variable associated with high score on second variable
• negative correlation: high score on first variable associated with low score on second variable
• no correlation: score on first variable not associated with score on second variable
Correlational Research
Correlation Coefficient: Correlation tells us about the strength (and shape) of the relationship between two variables. The square of the correlation tells us the proportion of the variables' variance that is shared between them.
Simple Regression:
Regression tells us about the nature of the function relating the two variables. For linear regression, which is what we consider here, regression fits a straight line, called the regression line, to the data so that the line best describes their relationship. Multiple Regression Multiple regression tells us about the nature of the function that relates a set of predictor variables to a single response variable. OLS (ordinary least squares) multiple regression assumes the function is a linear function.
Correlation vs. Regression
Covariance
When there is a relation between two variables they covary.
The Pearson correlation coefficient is a unit-free measure of the degree of covariance.
Covariance
Now consider a third variable:
A and B do not covary but C covary with both A and B
A, B and C all covary None covary . They are orthogonal.
The r2 is the amount of shared variation between the variables.
• scatterplots are used to provide a descriptive analysis of correlation – evaluate degree of relationship – assess linearity of relationship
• Pearson’s r measures correlations between two interval/ratio level variables – magnitude measured from 0.0 to 1.0 – direction indicated by + or - – statistical significance of correlation provided by p value
• Spearman’s rho measures correlations between two ordinal level variables
Measuring Correlations
• correlation is not causation
• directionality problem
• third-variable problem
• partial correlation
Interpreting Correlations
• regression allows prediction of a new observation based on what is known about correlation
• regression involves drawing a line that best describes a correlation Y = a + bX + e
• X is predictor variable; Y is criterion variable
Regression Analysis
The Multiple Regression Model
A multiple regression equation expresses a linear relationship between a dependent variable Y and two or more independent variables (X1, X2, …, Xk)
Y = α + β1X1 + β2X2 + … + βkXk + ε
b is called a partial regression coefficient. For example, b1 denotes the coefficient of Y on variable X1 that one would expect if all the other X variables in the equation were held constant.
Meaning of parameter estimates– Slope
• Change in Y per unit change in X. • Marginal contribution of X to Y holding all other
variables in the regression constant.
– Intercept • Meaningful only if X=0 is in the sample range.• Otherwise, merely extrapolation of linear
approximation.
• Expresses the amount of variance on criterion explained by predictor or set of predictors
• R2 increment - indicates the increase in the total
variance on the criterion accounted for by each new predictor added to the regression model
• 2 tests of significance are typically computed: i) is R different from 0, ii) is R2 increment statistically significant
Coefficient of determination - R2
Regression Equation for a Linear Relationship
A linear relationship of n predictor variables, denoted X1, X2, ... Xn to a single response variable, denoted Y is described by the linear equation involving several variables.
The general linear equation is:
Y = a + b1X1 + b2X2 + ... + bnXn
This equation shows that any linear relationship can be described by its: Intercept: The linear combination of the X's is zero.
Slopes: The slope specifies how much the variable Y
will change when the particular X changes by one unit.
1. The independent variable should be accurately measured with
negligible error.
2. The values of the dependent variable are normally distributed
3. Variation in the dependent variable (ie the spread around the line) is constant over values of the independent variable. This is known as homoscedasticity.
4. The values of residuals (the difference between the predicted and the expected values) have a normal distribution – that is, there are relatively few extremely small or extremely large residuals).
5. The values of the residuals are independent from each other – ie., they are randomly distributed along the regression line (there is no systematic pattern).
Regression Assumptions
Multiregression problems
Outliers. As with SLR, a single outlying point can greatly distort the results of MLR, but it is more difficult to detect outliers visually.
Too few subject. A general rule of thumb is that you need at least 10-20 data points for each X variable, otherwise it is too easy to be misled with spurious results.
Inappropriate model. Although complicated, MLR is too simple for some data. MLR assumes that each X variable contributes independently towards the value of Y, but often X variables contribute to Y by an interaction with each other.
Unfocussed studies. If you give the computer enough variables, some significant relationships are bound to turn up by chance,
and these may mean nothing.
Criteria for Developing a MLR Model
The overriding criterion is that any potential set of predictors must be scientifically defensible.
It is not good science nor proper use of statistics to put predictors in a model just because the data were available of to see “what happens”.
Other criteria:- A statistically significant overall model
- A large R2. The model explains a large amount of variation in Y.
- A small standard error (SQRT (MSE)) of the model. Is the regression precise enough so that findings have practical utility?
- Significant partial t tests. Does each X variable explain significant additional variation in Y given the other predictors in the model?
- Choose the smallest number of predictors to adequately characterize the variation in Y.
The model we can think of as having given rise to the observations is usually too complex to be described in every detail from the information available.
We have to rely on more simple models; approximations
Question: What’s sufficient?
Model Selection and Model Adequacy
The approximation should be sufficient for our purposes!
Note: ”More realistic” models might be more close to ”the true model”. However, we are NOT aiming at finding the true model! We are trying to find THE BEST APPROXIMATING MODEL.
How to select best model
Trade-off between Bias and Variance when considering model complexity (number of parameters)
VarianceBias
Number of Parameters
”Best Model”
Model Selection: The Likelihood Ratio Test
Basic idea: Add parameters only if they provide a significant improvement in the fit of the model to the data
lnL1
L0
ln L1 ln L0
”delta”
likelihood under Model 1
likelihood under Model 2
Akaike Information Criterion (AIC)
An approximation of the Kullback–Liebler discrepancy
AIC = –2lnL +2N
L= LikelihoodN= Number of parameters
Choose the model with the smallest AIC
AIC penalizes the model for additional parameters
Other alternatives for ranking models
Bayesian Information Criterion (BIC)
An approximation of the log of the Bayes Factor
BIC = –2lnL + N ln n
L= LikelihoodN= Number of parametersn = number of characters
Choose the model with the smallest BIC
For larger data, BIC should tend to choose simpler models than AIC (since the natural log of n is usually > 2)
Other alternatives for ranking models