Log-linear analysis
Summary
• Focus on data analysis
• Focus on underlying process
• Focus on model specification
• Focus on likelihood approach
• Focus on ‘complete-data likelihood’
• Focus on prediction
• Focus on interaction/association
• Link with risk analysis
• Unified perspective on different models
The approach
Risk measures
• Count: Number of events during given period (observation window)
• Probability: probability of an outcome: proportion of risk set experiencing a given outcome (event) at least once
• Risk set = all persons at risk at given point in time.
• Rate: number of events per time unit of exposure(per unit of any measure of size, e.g. time, space, miles
travelled)
Risk measures
• Difference of probabilities: p1 - p2
• Relative risk: ratio of probabilities (focus: risk factor)• prob. of event in presence of risk factor/ prob. of event in absence of risk
factor (control group; reference category): p1 / p2
• Odds: odds on an outcome: ratio of favourable outcomes to unfavourable outcomes. Chance of one outcome rather than another: p1 / (1-p1)
The odds are what matter when placing a bet on a given outcome, i.e. when
something is at stake. Odds reflect the degree of belief in a given outcome.
Relation odds and relative risk: Agresti, 1996, p. 25
Risk measures
• Odds
• Odds ratio : ratio of odds (focus: risk indicator, covariate)• odds in target group / odds in control group [reference category]: ratio of favourable
outcomes in target group over ratio in control group. The odds ratio measures the ‘belief’ in a given outcome in two different populations or under two different conditions. If the odds ratio is one, the two populations or conditions are similar.
) ... 0:scale] [odds (Range p-1
p Odds
Odds 1
1
Odds 1
Odds p
1-
) ,- :(range logit(p) p-1
pln ln(odds)
]exp[- 1
1 p
Risk analysis• Probability models:
– Counts Poisson r.v. Poisson distribution Poisson regression / log-linear model
– Probabilities binomial and multinomial r.v. binomial and multinomial distribution logistic regression / logit model
(parameter p, probability of occurrence, is also called risk; e.g. Clayton and Hills, 1993, p. 7)
– Rates Occurrences/exposure Poisson r.v. log-rate model
Analysis of count data
Introduction to log-linear models
The Poisson probability model
]exp[- !
} Pr{n
nNn
Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter :
The parameter is the expected number of events per unit time interval: = E[N]
Likelihood function
]exp[- !n
}n NPr{n
Probability mass function:
Log-likelihood function: n!ln - - lnn n) ;l(
Likelihood equations to determine ‘best’ value of
The log-linear model
The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure.
Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).
Log-linear models for two-way tables
μμμλAB
ij
B
j
A
iij μ ln
Saturated log-linear model:
μ
μA
i μ
B
j
Overall effect (level)
Main effects(marginal freq.)
Interaction effect μAB
ij
In case of 2 x 2 table:
4 observations
9 parameters
Normalisation constraints
Relation log-linear model and Poisson regression model
μμμλAB
ij
B
j
A
iij μ ln
xxxλln 3ij32j21i10ij
x , , 3ij2j1i xx are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is x x*x 2j1i3ij
Design matrixunsaturated log-linear model
μ ln μμλB
j
A
iij
uuuuu
λλλλ
B
2
B
1
A
2
A
1
22
21
12
11
10101
01101
10011
01011
ln
ln
ln
ln
Number of parameters exceeds number of equations need for additional equations
(X’X)-1 is singular identify linear dependencies
μ Y X Yμ XX ' -1
Hybrid log-linear models
• Hybrid log-linear models contain unconventional effect parameters.
• Interaction effects are restricted in certain way.
restrictions on interaction parameters.
Diagonals parameter model 1: (main) diagonal effect
cbaλ kjiij
With ck = 1 for i j and ck = c for i = j (diagonal)
Off-diagonal elements are independent and diagonal elements are changed by a common factor.
Examples of hybrid log-linear models
Diagonals parameter model 3: the diagonal and each minor diagonal has unique effect parameter
cbaλ kjiij
With k indicated the diagonal: k = R + i - j where R is the number of rows (or columns). There are 2R-1 values of ck.
Application: APC models
Diagonals parameter model 2: each diagonal element has separate effect parameter
ck = 1 for i j and ck = ci for i = j (diagonal)
Diagonal elements are predicted perfectly by the model
The log-rate model
Statistical analysis of occurrence-exposure rates
The log-rate model: the occurrence matrix and the exposure matrix
Occurrences: Number leaving home by age and sex, 1961 birth cohort: nij
Exposures: number of months living at home (includes censored observations): PMij
Age Female Male Total<20 135 74 209>=20 143 178 321Total 278 252 530Censored 13 40 53Total 291 292 583
Sex
Age Female Male Total<20 15113 16202 31315>=20 4876 9114 13990Total 19989 25316 45305
Sex
The log-rate model
u u u u PMλln AB
ijBj
Ai
ij
ij
with A AGE [early (before age 10) = 0; late (at age 20 or later) =1] and B SEX [female = 0; male = 1]
u u u u ABij
Bj
Aiijij PMln λln
u u u u ABij
Bj
Ai
o
ijij mλln offset
exposures
countsln
exposure
soccurrenceln
The log-rate model is a log-linear model with OFFSET
(constant term)
ij = E[Nij]PMij fixed
Log-rate model: rate = events/exposure
]u u u [u exp mm
N ABij
Bj
Ai
ij
ij
ij
ij E
mck ijjiij
Gravity model
With ck = 1 for i j and ck = c for i = j (diagonal)
!ln - - ln n) ;,l( nmcmcn ijijkjiijkjiij
]exp[- !
} Pr{ ij
ij
ij
ijij
nnN
nij
mck ijjiij
Logit model and log-linear model
A comparison
Log-linear model: μ ln μμμλAB
ij
B
j
A
iij
Select one variable as a dependent variable: response variable, e.g. does voting behaviour differ by sex
Are females more likely to vote conservative than males?
Logit model: γ ln B
j
2j
1j
λλ γ
μμμμμμλλ AB
21
B
1
A
2
AB
11
B
1
A
1
21
11 μ μ ln
Males voting conservative rather than labour:
Females voting conservative rather than labour:
μμμμμμλλ AB
22
B
2
A
2
AB
12
B
2
A
1
22
12 μ μ ln
Are females more likely to vote conservative than males?
Log-odds = logit
2 - - ln μ2μμμμμλλ AB
21
A
1
AB
21
AB
11
A
2
A
1
21
11
2 - - ln μ2μμμμμλλ AB
22
A
1
AB
22
AB
12
A
2
A
1
22
12
Effect coding (1)
θγγ B
1
B
1ln
θγγ B
2
B
2ln
A = Party; B = Sex
Are women more conservative than men? Do women vote more conservative than men? The odds ratio.
γγγγθθ B
1
B
2
B
1
B
2B
1
B
2 - γ γ ln
If the odds ratio is positive, then the odds of voting conservative rather than labour is larger for women than men. In that case, women vote more conservative than men.
0* - γ ln γγγθB
1
B
2
B
1
B
1
1* - γ ln γγγθB
1
B
2
B
1
B
2
bx a p-1
pln ln logit(p) η
pp
2
1 Logit model:
with a = γB
1 γ
and b = γγB
1
B
2
Log odds of reference category (males)
Log odds ratio (odds females / odds males)
with x = 0, 1