POL242 October 9 and 11, 2012 Jennifer Hove. Questions of Causality Recall: Most causal thinking in...

Crosstabs & Measures of Association

POL242

October 9 and 11, 2012

Jennifer Hove

Questions of CausalityRecall:

Most causal thinking in social sciences is probabilistic, not deterministic: as X increases, the probability of Y increases, not that X invariably produces Y

We can observe only association per HumeWe must therefore infer causationNot one, but many possible causes

Inferring Causal Relations1. There must be association

X Y; ~X ~Y

2. Time order must be consideredPresumed cause should precede presumed effect

3. Must rule out possible rival explanations Sometimes what appears to be a strong relationship

between two variables is due to influence of others

4. Must be able to identify the process by which one factor brings about change in anotherCausal linkage

Establishing AssociationWith nominal or ordinal data, relationships usually

presented in tabular or table formWhy? Hypotheses rest on core idea of comparison

Ex: if we compare respondents on basis of their value on the IV, say party identification, they should also differ along DV, say support for gay rights

Crosstabs are a wonderful means of making comparisons

“God speaks to you through crosstabs!”

Using/Interpreting CrosstabsData arranged in side-by-

side frequency distributionsIV (X) presented across the

top of the table – in columns If ordinal, arrange from low

scores (on left) to high scores (on right)

DV (Y) presented down the left hand side of the table – in rowsAgain, if ordinal, arrange

from low (at top) to high (at bottom)

Low HighAll

Respondents86.1%(173)

52.7%(355)

60.4%(528)

13.9(28)

47.3(318)

39.6(346)

Tau-b=.29

Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007

100(201)

100(673)

100(874)

Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007

Low

High

Total (N)

Fear of Taliban Resurgence

Support for Afghan Mission

Using/Interpreting CrosstabsData presented so

that categories of the IV add to 100%Percentaging within

categories of the IV (down in a table)

Comparisons are made across categories of the IVFrom left to rightTo see the effect of

the IV on the DV

Low HighAll


52.7%(355)

60.4%(528)

13.9(28)

47.3(318)

39.6(346)

Tau-b=.29

Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007

100(201)

100(673)

100(874)


Low

High

Total (N)



Rules (!) of Crosstabs1. Make the IV define the columns and the DV define

the rows of the table

2. Always percentage down within categories of the IV

3. Interpret the relationship by comparing across columns, within rows of the table

Example: 2 x 2 CrosstabSupport for Y Variable by Support for X Variable

Score on X Variable Low High

Score on Y Variable

Low A B A + B High C D C + D

A + C B+ D

Low HighAll


52.7%(355)

60.4%(528)

13.9(28)

47.3(318)

39.6(346)

100(201)

100(673)

100(874)


Low

High

Total (N)



DiagonalsMain diagonal: running to the right and down

When larger proportion of cases fall on main diagonal, relationship is said to be direct or positive

Low values on X associated with low values on Y; high values on X associated with high values on Y


Score on Y Variable


A + C B+ D

DiagonalsOff diagonal: running to the right and up

When larger proportion of cases fall on off diagonal, relationship is said to be inverse or negative

Low values on X associated with high values on Y; high values on X associated with low values on Y


Score on Y Variable


A + C B+ D

Explaining Variation in YRelationships between variables in social sciences

are rarely, if ever, perfectly predictableYou are unlikely to see something like this:

Support for Y Variable by Support for X VariableLow High

Low 100% 0High 0 100%Total 100 100

Score on X Variable

Score on Y Variable

Explaining Variation in YThere is likely to be more than one explanation or

“cause” behind the variation in YSo we will generally be looking at:

X1 Y

X2 Y

To compare, we want to know relative strength of each relationship

A variety of summary terms called measures of association are used

Measures of AssociationCompress information that appears in a crosstab

into a single number by summarizing:Magnitude (strength) of the relationshipDirection of the relationship

Magnitude: ranges from 0 (completely unpredictable) to 1 (perfectly predictable)

Direction: positive (+) = cases primarily on main diagonal; negative (-) = cases primarily on off diagonal

Two Cautionary NotesDirection is not useful with nominal-level variables,

since they are not ordered/ranked from low to highEven with ordinal measurement, interpretation of

direction depends entirely on how your variables are codedShould always code your variables so that high scores

indicate “more” of what you want to explain

Direction & StrengthCombining direction & strength, we get a range

of possibilities

All intermediary values can also occur, e.g.

-.2367Note that equivalent positive and negative scores are

equal in strengthEx: +.4 and -.4 are equal in strength; they differ only in

direction

-1.0 -.8 -.6 -.4 -.2 0 +.2 +.4 +.6 +.8 +1.0

Choosing among Measures We use different measures of association for 2 main

reasons:

1. There are different levels of measurementOrdinal measurement offers ranking information used

to calculate association, which isn’t available with nominal data

2. Some measures are specific to tables of certain sizes and shapesSpecific measures for 2 x 2 tables; others for larger

square tables; still others for rectangular tables

Phi ΦUse with dichotomous variables, 2 x 2 tablesApplies to nominal and ordinal dataMeasures the strength of a relationship by taking the

# of cases on the main diagonal minus the # of cases on the off diagonal (adjusting for marginal distribution of cases, i.e. the sum of the columns and rows)

))()()(( DBCADCBA

BCAD

2 Examples: Phi Φ

6.

2.

Low HighLow 75% 10%High 25% 90%Total 100 100

Score on X Variable

Score on Y Variable

Low HighLow 50% 20%High 50% 80%Total 100 100

Score on X Variable

Score on Y Variable

Cramer’s VAn extension of PhiLogic of Cramer’s V is based on percentage

differences across the columns, not on logic of diagonals

Use with nominal data, when tables are larger than 2 x 2

Lambda Lambda (λ) is another measure of association for

nominal dataIts rationale of “percentage of improvement” or

“proportion reduction in error” is relatively easy to explain

Not recommended in this courseWhen modal category of each column is in same row,

λ=0

Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals

Support for X Low Med High

Support for Y

Low a b c Med d e f

High g h i



Support for Y

Low a b c Med d e f

High g h i



Support for Y

Low a b c Med d e f

High g h i

Mind your Ps and QsThe letter P indicates the # of pairs of cases on the

main diagonals (from left to right)The letter Q indicates the # of pairs of cases on the

off diagonal (from right to left)If P > Q, we have a positive associationIf P < Q, we have a negative associationThe core calculation = P - Q

GammaThe information of P and Q can be used to

calculate Gamma (γ)

Problems:Any vacant cell produces a score of 1.0Tends to overstate strength of a relationship

QP

QP

QP

Q

QP

P

QP

QP

Tau-b and Tau-cPreferable to Gamma, though built on the same

logic of diagonalsTends to produce results similar to phi (using

nominal data) or the most important interval measure (r) – to be discussed later in the year

))(( YQPXQP

QPbTau

Tau-b and Tau-cTau-b never quite reaches 1.0 in non-square tablesSo Tau-c was developed to use with rectangular

tablesIn practice, the difference between Tau-b and Tau-c

when applied to the same table is not great, but keep the distinction above in mind

Example

Approval of Chavez

Very Bad

Bad GoodVery Good

All Respondents

Disapprove12.7%(26)

22.8%(64)

43.4%(171)

67.9%(110)

35.6%(371)

Approve87.3(178)

77.2(217)

56.6(223)

32.1(52)

64.4(670)

100(394)

100(162)

100(1041)

Table 2: Approval of President Chavez by Opinion of the United States, 2007

Opinion of the United States

Total (N)

100(204)

100(281)

Tau-c: -.39 Tau-b: -.35Source: Latinobarometer, 2007 – Venezuelan respondents only

Summing UpWith nominal data, use Phi or Cramer’s V

Phi used for 2 x 2 tablesCramer’s V used for any other crosstab involving

nominal dataAvoid Lambda

With ordinal data, use Tau-c or Tau-bTau-b used for square tables: 3 x 3, 4 x 4, etcTau-c used for rectangular tablesAvoid Gamma

Date post:	31-Mar-2015
Category:	Documents
Upload:	india-leason
View:	215 times
Download:	2 times

POL242 October 9 and 11, 2012 Jennifer Hove. Questions of Causality Recall: Most causal thinking in...

Documents