Measurement
Normality
Outliers
Homoscedasticity
Linearity
IndependenceAssumptions in Multiple
Regression Analysis
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Violations
Is there a violation?
How severe?
Can it be avoided?
Likely effects?
Can it be minimized?
What can I do?
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Key Assumptions
• Issues of Measurement• Normal Distribution• Minimization of Outliers• Homoscedasticity• Relationships are Linear• Independence
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Design issues
Statistical issues
Both
Sources of assumption violations
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Strength and Weakness
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Issues of Measurement
www.nearingzero.net
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Issues of Measurement
• Unreliability of measures
• Scale violations
• Multicolinearity
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Issues of Measurement
• Unreliability
• Scale violations
• Multicolinearity
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Unreliability
Unreliable measures affect the interpretation of regression
• Relationships are underestimated
• Type II error implications
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
An illustration using multiple regression
Reading fluency
Decoding
Time spent reading
Fluency training
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Issues of Measurement
• Unreliability
• Scale violations
• Multicolinearity
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Issues of Measurement
• Unreliability
• Scale violations
• Multicolinearity
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
How are variables measured?
1. Nominal2. Ordinal3. Interval4. Ratio
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
How are variables measured?
1. Nominal2. Ordinal3. Interval4. Ratio
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Measures of Central Tendency
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
How are variables measured?
1. Nominal2. Ordinal3. Interval4. Ratio
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
DUMMY CODING
Dichotomous, nominal or ordinal variables are permitted (with “dummy” coding).
1. Black2. White3. Hispanic4. Other
1 if score is Black, otherwise, 0.
1 if score is White, otherwise, 0.
1 if score is Other, otherwise 0.
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Issues of Measurement
• Unreliability
• Scale violations
• Multicolinearity
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Issues of Measurement
• Unreliability
• Scale violations
• Multicolinearity
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Multicolinearity
• You want to measure two DIFFERENT constructs
• but your two measures are very highly correlated
• You probably aren’t measuring two different constructs
• Eliminate or combine? Keep?
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
MulticolinearityTo mathematically check for the
presence of multicolinearity, SPSS allows you to run tolerance and Variance Inflation Factor (VIF) statistics.
• The closer the Tolerance value is to zero, the more multicolinearity exists in the model (0.20—rule of thumb).
• VIF—High VIF values are a problem (4.0 rule of thumb).
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence Homoscedasticity
Unequal variances at different levels of a variable
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Equal Variances and Residuals
http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Violation: Heteroscedasticity
• At each level of the independent variable (low to high), there is unequal variance for the residuals in the outcome variable.
Residual PlotE
arni
ngs
Importance
DV: $ donated
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Homoscedasticity
• Does the same variance exists across all levels of your independent variable.
http://pareonline.net Practical Assessment, research, and evaluation online—used with permission
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Heteroscedasticity
Low IV High
Res
idua
ls
Res
idua
ls
http://pareonline.net
fan
bow tie
http://pareonline.net Practical Assessment, research, and evaluation online—used with permission
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence Normality
Graphic and Numeric Investigations
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Normality allows
• The calculation of means and standard deviations
• Better inference to a population of interest
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Normality
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
• Skewness
• Kurtosis
• Outliers
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Residuals
• Should also be normally distributed
• Mean of 0• Have equal variances at all
values of the predictors (IV)
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Abnormality Detectives
GRAPHS
CALCULATIONS
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Histograms
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Visual Inspection
• Histograms
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Visual Inspection
• Histograms
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Visual Inspection
• Histograms
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Box & Whisker Plots
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
25%
25%
25%
25%
25%
25%
25%
25%
out liers
out liers
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Box Plots
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Visual Inspection
Box Plots
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Visual Inspection
Box Plots
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Box Plots
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Visual Inspection
Box-plot--Problems
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Box Plots
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
P-Plots
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Probability plot
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Probability plot
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Probability Plot
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
MCI Total
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Calculation Methods to Assess Normality
Skew = 0
Kurtosis = 0
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Skewness & Kurtosis
Skewness rule of thumb :0 = ideal 1 - 2 = uh oh1 = okay >2 = oh no
Values less than twice their standard error are considered good enough.
Descriptive Statistics
175 32.9429 4.89042 -.341 .184 .132 .365175 125.0457 14.31975 -.163 .184 -.758 .365175
MCIKNOWLMCITOTALValid N (listwise)
Statistic Statistic Statistic Statistic Std. Error Statistic Std. ErrorN Mean s.d. Skewness Kurtosis
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Outliers
Handy z-score formula
z = score - meanstandard deviation
You wonder if a score of 150 is an outlier. Your data set has a mean of 100, s.d. of 15. Figure it out.
http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
What to do with Outliers?
Those might be perfectly good observations.
Keep them and live with your
results.
Delete them! The outliers are
throwing everything off. Just get rid of
them.
“Statistics do not exclude data, analysts do.”
Good & Hardin Common Errors in Statistics (and How to Avoid Them) 2003 p. 139
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Retain, Discard, Do Nothing, Look at Both?
Abelson, 1995 Statistics as Principled Argument p. 70
“Abelson’s Third Law: Never flout a
convention just once.”
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Influence Statistics
How much does each individual data point influence the parameter estimates? You can find out.
1-Calculate your parameter estimate [mean] for all of the variables
2-Recalculate the estimate with one data point excluded
3 Compare the results of each estimate (first and second) and note the differences.
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Transformations
• Log transformations (take the logarithm of every score, and make it into a new variable)
• Taking the square root of scores is also a common transformation
• Both are accomplished under “compute” in SPSS
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Transformations
•Good for distributions
•Bad for interpretations
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Assumptions for data used in multivariate analyses
Assumption 1• At each value of dependent
variable, distribution of residuals is normal.
Assumption 2• Variances of residuals at every set
of values for the independent variables is equal. Known as homoscedasticity. Violated assumption known as heteroscedasticity.
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Assumptions for data used in multivariate analyses
Assumption 3• Mean value of residual equals
zero at each value of dependent variable — this is an extension of the bivariate assumption that relationship between IV and DV is linear.
Assumption 4 • For any 2 cases, the expected
correlation between the residuals = zero. This is the independence assumption/non-autocorrelation.
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Assumption of Linearity
http://pareonline.net
We expect to be able to predict one variable, on the basis of the value of another variable, and the values in question are best represented in a linear fashion.
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
What does non-linear look like?
Curvilinear relationship
Curvilinear relationship
Curvilinear relationship- Asymptotic
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
An important relationships
Goo
d re
sear
ch ju
dgm
ent
Experience with research issues
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Dixon and Reid 2000
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Sample
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Instruments
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Dixon & Reid 2000
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Hypotheses and Results
NLE
PLE
Depression
Depression
Depression
NLE
PLE
PLE
High
LowDepression
PLE moderates the effect of NLE on symptoms of depression
High
Normal
High
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Dixon & Reid 2000
High positive life events
Low positive lif
e events
Normal
High Depression
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Which Comes First?
NLE
NLE
Depression
Depression
Measurement
Normality
Outliers
Homoscedasticity
Linearity
Independence
Dixon & Reid 2000