Date post: | 31-Mar-2015 |
Category: |
Documents |
Upload: | india-leason |
View: | 215 times |
Download: | 2 times |
Crosstabs & Measures of Association
POL242
October 9 and 11, 2012
Jennifer Hove
Questions of CausalityRecall:
Most causal thinking in social sciences is probabilistic, not deterministic: as X increases, the probability of Y increases, not that X invariably produces Y
We can observe only association per HumeWe must therefore infer causationNot one, but many possible causes
Inferring Causal Relations1. There must be association
X Y; ~X ~Y
2. Time order must be consideredPresumed cause should precede presumed effect
3. Must rule out possible rival explanations Sometimes what appears to be a strong relationship
between two variables is due to influence of others
4. Must be able to identify the process by which one factor brings about change in anotherCausal linkage
Establishing AssociationWith nominal or ordinal data, relationships usually
presented in tabular or table formWhy? Hypotheses rest on core idea of comparison
Ex: if we compare respondents on basis of their value on the IV, say party identification, they should also differ along DV, say support for gay rights
Crosstabs are a wonderful means of making comparisons
“God speaks to you through crosstabs!”
Using/Interpreting CrosstabsData arranged in side-by-
side frequency distributionsIV (X) presented across the
top of the table – in columns If ordinal, arrange from low
scores (on left) to high scores (on right)
DV (Y) presented down the left hand side of the table – in rowsAgain, if ordinal, arrange
from low (at top) to high (at bottom)
Low HighAll
Respondents86.1%(173)
52.7%(355)
60.4%(528)
13.9(28)
47.3(318)
39.6(346)
Tau-b=.29
Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007
100(201)
100(673)
100(874)
Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007
Low
High
Total (N)
Fear of Taliban Resurgence
Support for Afghan Mission
Using/Interpreting CrosstabsData presented so
that categories of the IV add to 100%Percentaging within
categories of the IV (down in a table)
Comparisons are made across categories of the IVFrom left to rightTo see the effect of
the IV on the DV
Low HighAll
Respondents86.1%(173)
52.7%(355)
60.4%(528)
13.9(28)
47.3(318)
39.6(346)
Tau-b=.29
Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007
100(201)
100(673)
100(874)
Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007
Low
High
Total (N)
Fear of Taliban Resurgence
Support for Afghan Mission
Rules (!) of Crosstabs1. Make the IV define the columns and the DV define
the rows of the table
2. Always percentage down within categories of the IV
3. Interpret the relationship by comparing across columns, within rows of the table
Example: 2 x 2 CrosstabSupport for Y Variable by Support for X Variable
Score on X Variable Low High
Score on Y Variable
Low A B A + B High C D C + D
A + C B+ D
Low HighAll
Respondents86.1%(173)
52.7%(355)
60.4%(528)
13.9(28)
47.3(318)
39.6(346)
100(201)
100(673)
100(874)
Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007
Low
High
Total (N)
Fear of Taliban Resurgence
Support for Afghan Mission
DiagonalsMain diagonal: running to the right and down
When larger proportion of cases fall on main diagonal, relationship is said to be direct or positive
Low values on X associated with low values on Y; high values on X associated with high values on Y
Score on X Variable Low High
Score on Y Variable
Low A B A + B High C D C + D
A + C B+ D
DiagonalsOff diagonal: running to the right and up
When larger proportion of cases fall on off diagonal, relationship is said to be inverse or negative
Low values on X associated with high values on Y; high values on X associated with low values on Y
Score on X Variable Low High
Score on Y Variable
Low A B A + B High C D C + D
A + C B+ D
Explaining Variation in YRelationships between variables in social sciences
are rarely, if ever, perfectly predictableYou are unlikely to see something like this:
Support for Y Variable by Support for X VariableLow High
Low 100% 0High 0 100%Total 100 100
Score on X Variable
Score on Y Variable
Explaining Variation in YThere is likely to be more than one explanation or
“cause” behind the variation in YSo we will generally be looking at:
X1 Y
X2 Y
To compare, we want to know relative strength of each relationship
A variety of summary terms called measures of association are used
Measures of AssociationCompress information that appears in a crosstab
into a single number by summarizing:Magnitude (strength) of the relationshipDirection of the relationship
Magnitude: ranges from 0 (completely unpredictable) to 1 (perfectly predictable)
Direction: positive (+) = cases primarily on main diagonal; negative (-) = cases primarily on off diagonal
Two Cautionary NotesDirection is not useful with nominal-level variables,
since they are not ordered/ranked from low to highEven with ordinal measurement, interpretation of
direction depends entirely on how your variables are codedShould always code your variables so that high scores
indicate “more” of what you want to explain
Direction & StrengthCombining direction & strength, we get a range
of possibilities
All intermediary values can also occur, e.g.
-.2367Note that equivalent positive and negative scores are
equal in strengthEx: +.4 and -.4 are equal in strength; they differ only in
direction
-1.0 -.8 -.6 -.4 -.2 0 +.2 +.4 +.6 +.8 +1.0
Choosing among Measures We use different measures of association for 2 main
reasons:
1. There are different levels of measurementOrdinal measurement offers ranking information used
to calculate association, which isn’t available with nominal data
2. Some measures are specific to tables of certain sizes and shapesSpecific measures for 2 x 2 tables; others for larger
square tables; still others for rectangular tables
Phi ΦUse with dichotomous variables, 2 x 2 tablesApplies to nominal and ordinal dataMeasures the strength of a relationship by taking the
# of cases on the main diagonal minus the # of cases on the off diagonal (adjusting for marginal distribution of cases, i.e. the sum of the columns and rows)
))()()(( DBCADCBA
BCAD
2 Examples: Phi Φ
6.
2.
Low HighLow 75% 10%High 25% 90%Total 100 100
Score on X Variable
Score on Y Variable
Low HighLow 50% 20%High 50% 80%Total 100 100
Score on X Variable
Score on Y Variable
Cramer’s VAn extension of PhiLogic of Cramer’s V is based on percentage
differences across the columns, not on logic of diagonals
Use with nominal data, when tables are larger than 2 x 2
Lambda Lambda (λ) is another measure of association for
nominal dataIts rationale of “percentage of improvement” or
“proportion reduction in error” is relatively easy to explain
Not recommended in this courseWhen modal category of each column is in same row,
λ=0
Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals
Support for X Low Med High
Support for Y
Low a b c Med d e f
High g h i
Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals
Support for X Low Med High
Support for Y
Low a b c Med d e f
High g h i
Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals
Support for X Low Med High
Support for Y
Low a b c Med d e f
High g h i
Mind your Ps and QsThe letter P indicates the # of pairs of cases on the
main diagonals (from left to right)The letter Q indicates the # of pairs of cases on the
off diagonal (from right to left)If P > Q, we have a positive associationIf P < Q, we have a negative associationThe core calculation = P - Q
GammaThe information of P and Q can be used to
calculate Gamma (γ)
Problems:Any vacant cell produces a score of 1.0Tends to overstate strength of a relationship
QP
QP
QP
Q
QP
P
QP
QP
Tau-b and Tau-cPreferable to Gamma, though built on the same
logic of diagonalsTends to produce results similar to phi (using
nominal data) or the most important interval measure (r) – to be discussed later in the year
))(( YQPXQP
QPbTau
Tau-b and Tau-cTau-b never quite reaches 1.0 in non-square tablesSo Tau-c was developed to use with rectangular
tablesIn practice, the difference between Tau-b and Tau-c
when applied to the same table is not great, but keep the distinction above in mind
Example
Approval of Chavez
Very Bad
Bad GoodVery Good
All Respondents
Disapprove12.7%(26)
22.8%(64)
43.4%(171)
67.9%(110)
35.6%(371)
Approve87.3(178)
77.2(217)
56.6(223)
32.1(52)
64.4(670)
100(394)
100(162)
100(1041)
Table 2: Approval of President Chavez by Opinion of the United States, 2007
Opinion of the United States
Total (N)
100(204)
100(281)
Tau-c: -.39 Tau-b: -.35Source: Latinobarometer, 2007 – Venezuelan respondents only
Summing UpWith nominal data, use Phi or Cramer’s V
Phi used for 2 x 2 tablesCramer’s V used for any other crosstab involving
nominal dataAvoid Lambda
With ordinal data, use Tau-c or Tau-bTau-b used for square tables: 3 x 3, 4 x 4, etcTau-c used for rectangular tablesAvoid Gamma