Post on 01-Apr-2015
transcript
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 11
Measuring Item Interactions
11-2
Identifying Variable Types and Forms
• Direction of Causality• Independent variables influences or affects the
other• Dependent variable is the one being influenced or
affected
• Form of the Variables• All nominal variables are categorical• Ordinal, interval, and ratio variables are continuous
in form• Continuous variables may be recoded or treated as
categorical• If so, they must constitute a limited number of
categories
11-3
Measures of Association
Independent
Categorical ContinuousD
ep
en
den
t
Con
tin
uou
sC
ate
gori
cal
DiscriminantAnalysisF-Ratio
Cross-TabulationChi-Square
Analysis ofVariance
F-Ratio---------------
Paired T-TestValue of t
RegressionAnalysisF-Ratio---------------
CorrelationProbability of r
11-4
When To Use Cross-Tabulation
• Both variables are categorical (in the form of categories), rather than continuous
• The object is to see if the frequency or percentage distribution breakdown for one variable differs for each level of the other
• One variable is used to define the rows of the matrix and the other to define the columns
• If the distribution of each row or each column is proportional to the row or column totals, the two variables are not significantly related
11-5
Expected Cell Frequencies
• The lowest expected cell frequency for the table must be 5 or more
• Look down the row totals and circle the lowest row total
• Look across the column totals and circle the lowest column total
• Divide the lowest row total by the grand total for the entire table
• Multiply this value by the lowest column total to get the lowest expected cell frequency
• If it is less than five, combine the row or the column with another and recalculate the lowest cell frequency
11-6
The Cross-Tabulation Table
• Table is symmetrical: Either variable can be listed on the rows or columns
• There need not be a dependent and an independent variable
• If there is a dependent variable, it's often best to have it define the rows
• If the dependent variable defines the rows, column percentages work best
• Each percentage can then be compared to the total row percentages
11-7
Perfectly Proportional Cross-Tab Table and Graph
RowOne
RowTwo
Col. 1 Col. 2
0 10 20 30 40 50
50
50
50 50 100
One
Two
Col.Total
One TwoRowTotal
Chi Sq. = 0Sig. = 1.0000
25
25
25
25
11-8
Slightly Disproportional Cross-Tab Table and Graph
RowOne
RowTwo
Col. 1 Col. 2
0 10 20 30 40 50
50
50
50 50 100
One
Two
Col.Total
One TwoRowTotal
Chi Sq. = 4Sig. = 0.0455
30
20
20
30
11-9
Highly Disproportional Cross-Tab Table and Graph
RowOne
RowTwo
Col. 1 Col. 2
0 10 20 30 40 50
50
50
50 50 100
One
Two
Col.Total
One TwoRowTotal
Chi Sq. = 36Sig. = 0.0000
40
10
10
40
11-10
Perfectly Disproportional Cross-Tab Table and Graph
RowOne
RowTwo
Col. 1 Col. 2
0 10 20 30 40 50
50
50
50 50 100
One
Two
Col.Total
One TwoRowTotal
Chi Sq. = 100Sig. = 0.0000
50
0
0
50
11-11
Significance of Chi Square
• The statistical significance of the relationship depends on the probability of disproportions by row or by column if the distributions in the population were actually proportional
• The actual probability is based on the value of Chi-square and the degrees of freedom
• The number of degrees of freedom equals number of rows minus one times number of columns minus one (R-1) X (C-1)
• The probability can be read from a table, but it is usually generated by the analysis program
11-12
Ways to Describe the Statistical Significance of Cross-Tabs
• What is the probability this much difference in the proportions from row to row or column to column would result only from sampling error if the proportions were were equal in the population?
• If the proportions from row to row or column to column were the same in the population, what are the odds that a sample of this size would show this much difference in the proportions for the sample?
• What is the probability that proportions from row to row or column to column would be this different by chance, purely because of sampling error, if the proportions in the population were actually the same?
11-13
Analysis of Variance (ANOVA)
• Objective• To determine if the means of two or more variables are
significantly different from one another.
• Independent Variable• Nominal level data in the form of two or more categories.
• Dependent Variable• Interval or ratio level data in continuous form.
• Requirements• Dependent variable must be near-normally distributed and
the variance within each category must be approximately equal.
11-14
Variance Not Homogeneous
• Dispersion in the red category is greater than in the green
ANOVA
11-15
Skewed Distributions
• The distributions are asymmetrical (skewed to one side)
ANOVA
11-16
ANOVA or Paired T-Test?
• ANOVA requires that the data points are independent. (From different cases)
• ANOVA will measure significance of differences among more than two means or categories
• Paired T-Tests require that the data points are paired (That they come from the same case)
• Paired T-Tests can measure the significance of difference between only two means or variables
11-17
ca b
ANOVA - Difference Not Significant
• Mean a and b are very close.• Overlapping area is very large.
11-18
ANOVA - Difference Probably Significant
• Mean a and b are far apart• Overlapping area is rather small
ca b
11-19
Source S.S. d.f. M.S. F PBetween groups 100 1 100 5.00 0.00Within groups 180 9 20Combined 280 10
• SOURCE - The source of the variance value• S.S. - Sums of Squared deviations from a mean• d.f. - Degrees of freedom related to variance• M.S. -Mean Squares or S.S. divided by d.f.• F - The ratio of M.S.Between over M.S.Within
• P - The probability of this value of the F-ratio
The ANOVA Table
11-20
ANOVA Terms — Sums of Squares
• S.S.—The sum of squared deviations of each data point from some mean value
• Within groups—The total squared deviation of each point from the group mean
• Combined—The total squared deviation of each data point from the grand mean
• Between groups—The difference between S.S. combined and S.S. within groups
Source S.S. d.f. M.S. F P
Between groups 100 1 100 5.00 0.00Within groups 180 9 20
Combined 280 10
11-21
ANOVA Terms — Degrees of Freedom
• d.f.—The number of cases minus some "loss" because of earlier calculations.
• Within groups d.f.—The total number of cases minus the number of groups.
• Combined d.f.—Equal to the total number of cases minus one.
• Between groups d.f.—Equal to the total number of groups minus one.
Source S.S. d.f. M.S. F P
Between groups 100 1 100 5.00 0.00Within groups 180 9 20
Combined 280 10
11-22
ANOVA Terms — Mean Squares & F-Ratio
• M.S.—the sums of squares (S.S.) divided by the degrees of freedom (d.f.).
• F—the ratio of mean squares between groups to the mean squares within groups.
Source S.S. d.f. M.S. F P
Between groups 100 1 100 5.00 0.00Within groups 180 9 20
Combined 280 10
11-23
Ways to Describe the Statistical Significance of ANOVA
• What is the probability that this much of a difference between these sample mean values would result due to sampling error if the means for the groups in the population were equal?
• If the group means in the population as a whole were the same, what are the odds that a sample of this size would show this much difference in the sample group means?
• What is the probability that the sample group means would be this different by chance, purely because of sampling error, if the group means in the population were actually the same?
11-24
Correlation Analysis
• Objective• To determine degree and significance of relationship
between a pair of continuous variables
• Causality• The analysis does not assume that one variable is
dependent on the other. If A is correlated with B:• A may be causing B• B may be causing A• A and B may be interacting• C may be causing A and B
11-25
Correlation Analysis
• Requirements• Both variables must be continuous and obtained
from an interval or a ratio scale
• Non-Parametric Correlation• Both variables must be continuous but one or both
may be only ordinal scale level
11-26
Regression Analysis
• Objective• To determine if variable X has a significant effect
on variable Y
• Independent Variable• X must be continuous, interval or ratio level data
• Dependent Variable• Y must be continuous, interval or ratio level data
11-27
Regression Analysis Requirements
• The data plot must be linear• The data plot must be in a straight line or very
nearly so
• The data plot must be homoskedastic• The vertical spread must be about the same
from left to right
11-28
Regression
Unacceptable Heteroskedastic Regression Plot
• Typical funnel-shaped plot• The scatterplot must be homoskedastic• Variance must be approximately the same
11-29
+ ++
+
++
-
++ -
--
---
--
+
Unacceptable Curvilinear Regression Plot
• The scatterplot must be linear
• A runs test will reveal nonlinearity
• It gives probability of consecutive signs
Regression
11-30
Unacceptable Quadratic Regression Plot
• Two linear segments with one bend• Three segments, two bends is cubic, etc.• Regression must be limited to one range
Regression
11-31
Weak RelationshipStrong Relationship
The Regression Scatterplot
• Independent variable X on the horizontal axis• Dependent variable Y on the vertical axis• Regression equation: Y = a + bX
11-32
Regression Plot andRegression Table
Regression TableCorr. (r) .93784 N of cases25 Missing 0R-Square .87954 S.E. Est. 8.76849 Sig. R 0.0000Intercept (A) 88.90818 S.E. of A 3.64090 Sig. A 0.0000Slope (B) -0.96698 S.E. of B 0.07462 Sig. B 0.0000
Analysis of VarianceSource S.S. d.f. M.S. F Ratio F Prob.Regression 12911.77 1 12911.77 167.9332 0.0000Residual 1768.38 23 76.89
0255075
100
0 20 40 60 80 100
11-33
Regression Coefficients
• Corr. (r) — The coefficient of correlation• R-Square — The coefficient of determination
• The percentage of variance in Y explained by knowing X
• Intercept (A) — Value of Y if X is zero• Slope (B) — The rise over the run• Regression equation — Y = a + bX
Regression TableCorr. (r) .93784 N of cases25 Missing 0R-Square .87954 S.E. Est. 8.76849 Sig. R 0.0000Intercept (A) 88.90818 S.E. of A 3.64090 Sig. A 0.0000Slope (B) -0.96698 S.E. of B 0.07462 Sig. B 0.0000
11-34
Regression Coefficients
• S.E. Estimate — StandardY based on the value of X• S.E. of the estimate based on the regression equation
• S.S. Regression — Sum of squared deviations of each data point from the regression line
• S.S. Residual — The difference between S.S. total (around the mean of Y) and S.S. Regression
Regression TableCorr. (r) .93784 N of cases25 Missing 0R-Square .87954 S.E. Est. 8.76849 Sig. R 0.0000Intercept (A) 88.90818 S.E. of A 3.64090 Sig. A 0.0000Slope (B) -0.96698 S.E. of B 0.07462 Sig. B 0.0000
11-35
Ways to Describe the Statistical Significance of Regression
• What is the probability this much variance in the values of the dependent variable would would be “explained” by the values of the independent variable, only because of sampling error, if the two variables were unrelated in the population?
• If these two variables were actually independent of one another in the population, what are the odds that this size sample would show this much of a relationship?
• What is the probability that the values of X would explain this much variance in Y, purely by sampling error, if X and Y were unrelated to one another in the entire population?
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
End of Chapter 11