Post on 20-May-2020
transcript
Political Science Association, Methodology InitiativeBritish Academy, November 25, 2015
Statistical Modeling to Understand Terrorism: An Overview of NewTools
JEFF GILLWashington University
Political Science Association, PolMeth Initiative [1]
Motivation
◮ The safety of millions of people depends on the understanding of the workings of covert networks,
especially of terrorist networks.
◮ To protect people, governments and nongovernmental organizations invest enormous amounts of
time and energy to detect covert networks and to thwart terrorist events and other kinds of attacks.
◮ Terrorism is an important political and public health problem because it affects:
⊲ government stability,
⊲ personal safety,
⊲ immediate epidemiological concerns,
⊲ internal government policies,
⊲ public perception and panic,
⊲ and possibly widespread health effects.
◮ Academic work on terrorism has increased dramatically in recent
decades for obvious reasons, but remains under-developed.
Political Science Association, PolMeth Initiative [2]
Representative Historical and Descriptive Approaches
◮ Terrorists play to the media. Wilkinson. “The Media and Terror: A Re-
assessment.” Terrorism and Political Violence 9(2), 1997, 51-64.
◮ Terrorists become more militant following concessions. Ethan
BdM,. “Conciliation, Counterterrorism, and Patterns of Terrorist Violence: A Compara-
tive Study of Four Cases.” IO 59(1), 2003, 145176.
◮ Terrorism works better against democracies than tyrannies.
Dershowitz, Why Terrorism Works: Understanding the Threat, Responding to the Chal-
lenge., 2002, Yale University Press.
◮ Predicting terrorism is hard but sociological understanding is
easier. Boyns and Ballard, “Developing a Sociological Theory for the Empirical Un-
derstanding of Terrorism.” The American Sociologist 35(2), 2008, 5-25.
◮ Planes are special kinds of weapons. Einav, “Understanding Aviation
Terrorism.” Interavia: Business & Technology 58(670), 2003, 34-37.
Political Science Association, PolMeth Initiative [3]
Representative Formal or Game Theoretic Approaches
◮ Normal-form games can distinguish proactive from defensive poli-
cies. Sandler & Arce, “Terrorism: A Game-Theoretic Approach.” In Handbook of Defense
Economics, Sandler & Hartley (eds.). Volume 2, 775-813, Elsevier.
◮ Most terrorists are non-suicidal rational actors attacking soft tar-
gets. Atkinson, Sandler & Tschirhart, “Terrorism in a Bargaining Framework.” JLEO 30,
1987, 1-21.
◮ Probabilistic risk analysis shows vulnerabilities. Harris, “Mathematical
Methods in Combatting Terrorism.” Risk Analysis 24(2), 2004, 985-988.
◮ Government should provide incentives for former terrorists to exert
counterterrorism efforts. Ethan BdM, “The Terrorist Endgame: A Model with Moral
Hazard and Learning. JCR 49(2), 2005, 237258.
◮ Those with low ability or little education are most likely to join.
Ethan BdM, “The Quality of Terror.” AJPS 49(3), 2005, 515530.
Political Science Association, PolMeth Initiative [4]
Representative Economic Approaches
◮ Terrorism constitutes transnational externalities and market fail-
ures. Todd Sandler and Walter Enders. “An Economic Perspective on Transnational Terror-
ism.” In The Economic Analysis of Terrorism, Tilman Bruck (ed.). 11-28, 2007, Routledge.
◮ Trade and FDI reduce terrorism. Li & Schaub, “Economic Globalization and
Transnational Terrorism: A Pooled Time-Series Analysis.” JCR 48(2), 2004, 230-258.
◮ Economic Centers Are At Risk. Rosoff & von Winterfeldt, “A Risk and Economic
Analysis of Dirty Bomb Attacks on the Ports of Los Angeles and Long Beach.” Risk Analysis,
27(3) 2007, 1539-6924.
◮ Terrorism is bad for tourism. Sloboda, “Assessing the Effects of Terrorism on
Tourism by Use of Time Series Methods. Tourism Economics 9(2), 2003, 179-190.
◮ There exist links between the national economy and homegrown
terrorism. Blomberg, Hess & Weerapana, “Economic Conditions and Terrorism.” EJPE,
20(2), 2004, 463-478.
Political Science Association, PolMeth Initiative [5]
Data Problems with Individual/Events Level Approaches
◮ Micro-level empirical work in this area has not produced many revealing insights.
◮ There are some major deficiencies in direct data-analytic micro-studies of terrorism:
⊲ the data consist of either publicly observed events or classified data at government agencies,
⊲ government actions are typically censored to scholars,
⊲ targets are strategic, actions are dynamic: the subjects are deliberately trying to deny observers
information,
⊲ existing tools for filling in missing information are inappropriate,
⊲ qualitative and technical experts have not traditionally coordinated,
⊲ and it can even be physically dangerous.
◮ Can we use standard data-analytic regression techniques despite these problems?
Political Science Association, PolMeth Initiative [6]
An Example of Basic Data Analysis for Terrorism Data
◮ Violent events within the state of Israel.
◮ Subsetted to give 103 suicide attacks with explosives over a three-
year period from November 6, 2000 to November 3, 2003 when there
was a steep drop (the early period of the first “Intifada”).
◮ Information provided: date and place of the attack, attack type,
the type of target and device employed, organizational affiliation
of the attacker, and the number of casualties, along with a written
description of the attack.
◮ Casualties are given personal attributes such as name, age, sex,
nationality, and religion.
◮ These data are subsetted by Mark Harrison (2006).
Political Science Association, PolMeth Initiative [7]
Terrorism Data
harr <- read.table("http://jgill.wustl.edu/data/harrison4.txt",header=TRUE)
apply(harr[,-1],2,table)
$NumberKilled
0 1 2 3 5 6 7 8 9 11 15 17 19 21 23 24 30
44 13 9 8 3 2 3 2 2 3 4 3 1 3 1 1 1
$NumberInjured
0 1 2 3 4 5 6 8 9 11 13 14 16 17 20 21 22 26 27 30
28 1 5 4 4 3 1 2 2 2 1 1 1 1 3 1 1 1 1 5
40 42 47 50 52 57 58 59 60 65 69 86 90 100 102 120 130 150 188
3 1 1 7 1 1 1 2 5 1 1 1 1 3 1 1 1 2 1
$TotalCasualties
0 1 2 3 4 5 6 8 9 10 12 13 15 17 20 21 26 27 29 30
22 5 6 4 3 3 2 2 1 1 2 2 1 1 2 1 1 2 1 1
31 32 35 38 45 49 50 51 52 53 57 58 59 61 62 63 65 67 71 75
1 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 2 2 3 2
81 91 93 105 106 123 126 141 145 151 180 199
1 1 1 1 1 1 1 1 1 1 1 1
Political Science Association, PolMeth Initiative [8]
Terrorism Data
$ResponsibleHamas $ResponsibleisMartyrs
0 1 0 1
59 44 78 25
$ResponsibleisPIJ $ResponsibleisOther
0 1 0 1
79 24 99 4
$TargetisMilitary $TargetisCivilian
0 1 0 1
76 10 10 76
$TargetisBus $TargetisCafe
0 1 0 1
89 14 89 14
$TargetisCheckpoint $TargetisResidence
0 1 0 1
87 16 102 1
Political Science Association, PolMeth Initiative [9]
Terrorism Data
$TargetisOffshore $TargetisStore
0 1 0 1
101 2 96 7
$TargetisStreet $TargetisTravelstop
0 1 0 1
71 32 88 15
$DeviceisCar $DeviceisBoat
0 1 0 1
89 14 101 2
$AttackisPrevented $AttackerisChallenged
0 1 0 1
101 2 63 40
$FirstAttackerisMale $FirstAttackerisFemale
0 1 0 1
7 92 92 7
Political Science Association, PolMeth Initiative [10]
Terrorism Data
$AgeofFirstAttacker
16 17 18 19 20 21 22 23 24 25 26 27 29 31 43 45 48
1 8 7 10 15 11 10 12 2 3 2 1 3 1 1 1 1
◮ Data Notes:
⊲ measurement here is very “nongranular,”
⊲ some dichotomous variables are also very lopsided,
⊲ information filtered through a government reporting source,
⊲ and the real data generating process is never observed: motivations, planning, and training.
◮ An additional challenge is grouping or clustering in the data.
Political Science Association, PolMeth Initiative [11]
Terrorism Data Analysis
Attacker is Challenged Device is Car
2.8 3.0 3.2 3.4 3.6 3.8
05
1015
2025
30
2.8 3.0 3.2 3.4 3.6 3.8
log(AgeofFirstAttacker)
NumberK
illed
No
Yes
Given : as.factor(AttackerisChallenged)
2.8 3.0 3.2 3.4 3.6 3.8
05
1015
2025
30
2.8 3.0 3.2 3.4 3.6 3.8
log(AgeofFirstAttacker)
NumberK
illed
No
Yes
Given : as.factor(DeviceisCar)
Target is Military Hamas Responsible
2.8 3.0 3.2 3.4 3.6 3.8
05
1015
2025
30
2.8 3.0 3.2 3.4 3.6 3.8
log(AgeofFirstAttacker)
NumberK
illed
No
Yes
Given : as.factor(TargetisMilitary)
2.8 3.0 3.2 3.4 3.6 3.8
05
1015
2025
30
2.8 3.0 3.2 3.4 3.6 3.8
log(AgeofFirstAttacker)
NumberK
illedNo
Yes
Given : as.factor(ResponsibleHamas)
Political Science Association, PolMeth Initiative [12]
Terrorism Data Analysis
◮ One useful approach is to fit a log-linear form (generalized additive model) where the outcome
variable is the number killed, mixing estimated and smoothed fits simultaneously:
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.198 0.121 9.89 < 2e-16
AttackerisChallenged -1.406 0.154 -9.14 < 2e-16
FirstAttackerisFemale 0.217 0.231 0.94 0.35
DeviceisCar 0.332 0.251 1.33 0.18
TargetisCafe 0.466 0.118 3.96 7.5e-05
TargetisMilitary -3.286 0.505 -6.50 7.9e-11
ResponsibleHamas 0.877 0.125 7.02 2.2e-12
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
te(log(AgeofFirstAttacker),log(Date)) 4.81 4.97 94 <2e-16
Political Science Association, PolMeth Initiative [13]
Viewing the Nonparametric Results
log(AgeofFirstAttacker)
log(D
ate)
te(log(AgeofFirstAttacker),log(Date),5.6)
Political Science Association, PolMeth Initiative [14]
Viewing the Nonparametric Results
log(AgeofFirstAttacker)
log(Date)
te(log(AgeofFirstAttacker),log(Date),5.6)
Political Science Association, PolMeth Initiative [15]
Hidden Effects
◮ This past example is considered an “easy case” since it is confined to a single nation, with a
well-identified problem.
◮ Most collections of terrorism data contain heterogeneous hidden, possibly clustered, effects from:
⊲ actors who are trying to hide important information,
⊲ who are also trying to purposely mislead observers,
⊲ the presence of strong network effects, even if the whole network is not observable.
⊲ measurement on groups that are highly imitative,
all suggesting latent structures in the data that are not directly measured by the explanatory
variables.
◮ So how do we account for such unobserved heterogeneity?
Political Science Association, PolMeth Initiative [16]
A New Modeling Enhancement
◮ Joint work with George Casella (JASA 2009, Annals of Stats 2010, etc.).
◮ Let’s add a “random effect” term that accounts for heterogeneity:
Y = β0 +X1β1 + · · · +Xkβk +Ψ + ǫ
where the new term adds some differences by group to each case: Ψ = [ψ1, ψ2, . . . , ψ103] (with
mean zero, and not unique) just so that the model fits better.
◮ The problem with this is that it does not account for any information yet, and we have to know
grouping information.
◮ A more useful version is the Dirichlet Process Random Effects Model which pulls-out subtle
information in the X matrix “non-parametrically” so these ψi values are assigned accounting for
latent information:
Y = β0 +X1β1 + · · · +Xkβk +DP(m,G0) + ǫ,
which is a computationally-intensive process that iteratively fits many different binning assign-
ments as a Gibbs Sampler runs, and summarizes the results in the final model.
Political Science Association, PolMeth Initiative [17]
Dirichlet Process Priors, Some Background Definitions
◮ Y is a random variable taking values on the measurable space (Y ,B), defined by the support of
Y and an arbitrary (for now) abstract space B.
◮ The “parameter” of interest here is P , the associated, but unknown, probability measure taking
values in P , the collection of all probability measures on (Y ,B).
◮ Define S as the smallest σ-field (closed under countable unions) generated by sets of the form:
{P : P (A) < r}, where: A ∈ B, r ∈ [0 : 1]
◮ Now define ν as a probability measure on (P ,S), which can be used as a prior distribution for the
unknown P .
◮ We are interested in computing ν∗, the posterior distribution of P |Y .
◮ ν is called a Dirichlet Measure if for every measurable partition {B1, . . . , BK} (and finite K) of
the parameter space B, the distribution of P (B1), . . . , P (BK) under ν is Dirichlet:
f (y|α1, . . . , αK) ∝ yα1−11 · · · yαK−1
K , 0 ≤ yi ≤ 1,∑K
i=1 yi = 1, 0 < αi, ∀i ∈ [1, 2, . . . , K].
Political Science Association, PolMeth Initiative [18]
The Distributional Structure
◮ Ferguson (1973, 1974, 1983) and Antoniak (1974) introduced the Dirichlet process prior for non-
parametric G, which is this random probability measure on the space of all measures.
◮ We notate this distribution conventionally over the space of distributions by:
⊲ G0, a base distribution (finite non-null measure) which is analogous to an “expected value” of
the distributions,
⊲ λ > 0, a concentration/precision parameter (finite and non-negative scalar) giving the spread
of distributions around G0,
⊲ therefore φ0 = λG0 is a base measure,
⊲ leading to the prior specification G ∼ DP(λ,G0) ∈ P .
◮ For any finite partition of the parameter space, {B1, . . . , BK}, the joint distribution of these
probabilities has the Dirichlet distribution, now according to:
{G(B1), . . . ,G(BK) ∼ D(λG0(B1), . . . , λG0(BK)},
where for some observed partition, these are just multinomial probabilities.
Political Science Association, PolMeth Initiative [19]
Setting Up the Estimation Process
◮ Since realizations of the DP select a discrete distribution with probability one (even though the
generating mechanism is continuous), the model for the random effect ψ is a countably infinite
mixture (some key papers: Ferguson 1973, Antoniak 1974, Berry & Christensen 1979, Lo 1984,
Escobar & West 1995, MacEachern & Muller 1998).
◮ Blackwell and MacQueen (1973) noted the following (generally, not random effects):
⊲ If G is a DP , where ψ1, . . . , ψn iid from G,
⊲ then the marginal distribution of ψ1, . . . , ψn (marginalized over any prior parameters) is equal
in distribution to the first n steps of a Polya process.
◮ Blackwell and MacQueen then proved that the joint distribution of ψ is a product of successive
conditional distributions of the form:
ψi|ψ1, . . . , ψi−1 ∼λ
i− 1 + λφ0(ψi) +
1
i− 1 + λ
i−1∑
l=1
δ(ψi = ψl),
where δ denotes the Dirac delta function.
◮ Therefore reference can be made to finite rather than infinite dimensions, and Dirichlet process
posterior calculations involve a single parameter over this space (Ferguson’s Theorem 1, 1973).
Political Science Association, PolMeth Initiative [20]
Review of the Polya Process
◮ The Polya Process for sampling ψ is equivalent to the following permutation scheme:
⊲ a restaurant has many large circular tables.
⊲ n diners enter one-at-a-time to be seated, where the first person sits at the first table.
⊲ For a given weight, λ, the ith person sits at the unoccupied ith table with probability
λ/(i− 1 + λ).
⊲ Otherwise this diner selects the jth (j < i) previously occupied table with probability
nj/(i− 1 + λ), where nj is the number seated at that table already.
◮ Now the table locations of the seated diners, ξ1, . . . , ξn, is a dependent exchangeable sequence.
◮ ξ∗ = (ξ1, . . . , ξk) with k ≤ n, the set of non-empty tables, is a sample from G.
◮ This process can be iterated many times to numerically integrate over this space.
Political Science Association, PolMeth Initiative [21]
Models and Likelihood
◮ A general random effects Dirichlet Process model can now be written definitionally as:
(Y1, . . . , Yn) ∼ f (y1, . . . , yn | θ, ψ1, . . . , ψn) =∏
i
f (yi|θ, ψi), ψi ∼ DP(λ, φ0), i = 1, . . . , n
(the vector θ here is a placeholder for all of other the estimated parameters, X assumed).
◮ Applying the successive conditional distributions of Blackwell and McQueen, we integrate over the
random effects to get the likelihood function:
L(θ | y) =
˙
f (y1, . . . , yn | θ, ψ1, . . . , ψn)π(ψ1, . . . , ψn) dψ1 · · · dψn
=Γ(λ)
Γ(λ + n)
n∑
k=1
λk
∑
C:|C|=k
k∏
j=1
Γ(nj)
ˆ
Ψ
f (y(j) |θ, ψj)φ0(ψj) dψj
where the second form is derived in Lo (1984 Annals) Lemma 2 and Liu (1996 Annals), and:
⊲ C is a partition of the sample of size n into k groups, k = 1, . . . n− 1
⊲ y(j) is the vector of yis in subcluster j
⊲ ψj is the common random effects parameter applied to that subcluster.
Political Science Association, PolMeth Initiative [22]
Matrix Representation of Partitions
◮ Since every “diner” at a given table gets the same random effects value, we want an efficient way
to keep track of assignments on each cycle of the sampler.
◮ Associate a binary matrix An×k with a given partition C, for example:
C = {S1, S2, S3} = {{1, 2}, {3, 4, 6}, {5}} ↔ A =
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 1 0
◮ Rows: ai is a 1× k vector of all zeros except for a 1 in its subcluster
◮ Columns: The column sums of A are the number of observations in the groups
◮ Variables: thus ψi ∈ Sj ⇒ ψi = ηj (constant in subclusters)
◮ This is similar to (but different from) the matrix approach in McCullagh and Yang (2006).
Political Science Association, PolMeth Initiative [23]
Mapping Partitions to the Underlying Random Effects
◮ Continuing with the contrived example:
C = {S1, S2, S3} = {{1, 2}, {3, 4, 6}, {5}} ↔ A =
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 1 0
◮ This leads to the matrix representation:
ψ = Aη where A =
a1a2...
an
so
ψ1
ψ2...
ψ6
=
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 1 0
η1η2η3
.
◮ So we only need to generate three random variables in the sampler.
Political Science Association, PolMeth Initiative [24]
Incorporating the A Matrix
◮ Return to:
Y|ψ ∼ N (Xβ + ψ, σ2I), where ψi ∼ DP(λ,N (0, τ 2)), i = 1, . . . , n
where we are explicitly averaging over all normals with mean zero as our DPP choice.
◮ Introduce the A matrices to get
Y|A, η ∼ N (Xβ + Aη, σ2I), η ∼ Nk(0, τ2I),
meaning that η is now the focus of the Bayesian nonparametric process.
◮ Now marginalizing over these η, we find that:
Y|A ∼ N (Xβ,Σ∗), Σ∗ =
(
I +τ 2
σ2AA′
)
since the DPP is applied to the random effects only.
Political Science Association, PolMeth Initiative [25]
Does Democracy Invite Terrorism?
◮ Looking at terrorist activity in 22 Asian countries over 8 years (1990-1997).
◮ Is there a relationship between levels of democracy and the number of terrorist attacks.
◮ Data problems restrict the number of cases to 150, and require us to use the Dirichlet Process
Random Effects Model to handle latent heterogeneity.
◮ The outcome of interest is dichotomous indicating whether or not there was at least one major
violent terrorist act in a country/year pair:
0 1
83 67
◮ We also include 4 explanatory variables in the model. . .
Political Science Association, PolMeth Initiative [26]
Does Democracy Invite Terrorism?
◮ DEM: measures democracy from the Polity IV 21-point democracy scale ranging from -10 indicating
a hereditary monarchy to +10 indicating a fully consolidated democracy:
-8 -7 -2 -1 0 1 3 4 5 6 7 8 9 10
1 31 7 2 4 4 3 4 19 7 5 18 10 35
◮ FED: assigned 0 if sub-national governments do not have substantial taxing, spending, and regula-
tory authority, and 1 otherwise:
0 1
122 28
◮ SYS: coded as 0 for direct presidential elections, 1 for strong president elected by assembly (in-
cluding sham assemblies), and 2 for dominant parliamentary government:
0 1 2
37 27 86
◮ AUT is a dichotomous variable indicating whether or not there are autonomous regions not directly
controlled by central government:
0 1
143 7
Political Science Association, PolMeth Initiative [27]
Does Democracy Invite Terrorism?
◮ So now our model looks like this:
logit(Y) = log
(p
1− p
)
= β0 + DEMβ1 + FEDβ2 + SYSβ3 + AUTβ4 +DP(m,G0) + ǫ,
where p = p(Y = 1) is the probability of a “success,” given levels of the X variables.
◮ Expressed in this way, the specification features the “log-odds” model interpretation since log()
denotes the natural log function and p/(1− p) transforms probability to odds.
◮ Notice the use of the Dirichlet Process Random Effect here.
Political Science Association, PolMeth Initiative [28]
Does Democracy Invite Terrorism?
Dirichlet Process Model
Explanatory Variable COEF SE 95% CI Odds-Ratio
Intercept 0.127 0.188 -0.241 0.495 1.135
DEM (-10:10) 0.058 0.019 0.020 0.095 1.060
FED (0,1) 0.258 0.254 -0.241 0.756 1.294
SYS (0,1,2) -0.420 0.137 -0.690 -0.151 0.657
AUT (0,1) 0.450 0.371 -0.277 1.176 1.568
Political Science Association, PolMeth Initiative [29]
What Causes Terrorist Groups To Use Suicide Attacks, Background
◮ Suicide attacks pose a substantially higher challenge for governments since the assailant has great
control over placement and timing and also does not need to plan his or her escape.
◮ The data we use here come from the Global Terrorism Database II (LaFree & Dugan 2008),
restricted here to events in the Middle East and Northern Africa from 1998 to 2004.
◮ There were 273 terrorist attacks worldwide in 1998 with a (then) recorded high of 741 killed along
with 5952 injured.
◮ This starting year was also notable for the incredibly destructive simultaneous August truck bomb-
ings of U.S. Embassies in Nairobi, Kenya (212 killed and roughly 5000 injured), and Dar es Salaam,
Tanzania (11 killed and roughly 85 injured).
◮ After removing almost totally incomplete cases, this provides 1041 violent attacks by terrorist
groups, 154 (15%) of which were suicide attacks where at least one of the individual assailants was
killed by design.
◮ Our outcome variable of interest is therefore the dichotomous observation of a suicide attack or
not.
◮ Again use the Dirichlet Process Random Effects Model.
Political Science Association, PolMeth Initiative [30]
What Causes Terrorist Groups To Use Suicide Attacks, Explanatory Variables
◮ MULT.INCIDENT: whether the attack is part of a coordinated multi-site event (13.1%).
◮ MULT.PARTY: multiple groups claiming credit, 136 out of 1041 cases coded as one.
◮ SUSP.UNCONFIRM is coded as one (209/1041) if government officials express notable doubt about
attributing responsibility.
◮ SUCCESSFUL: (some damage in 966 of the events) asks given that it is a successful attack, how
likely is it that a suicide assailant was used?
◮ WEAPON.TYPE: coded one for the use of: explosives, dynamite, or general bombs (558/1041).
◮ NUM.FATAL (3424 total).
◮ NUM.INJUR (8123 total).
◮ PSYCHOSOCIAL with ascending levels: none (18), minor (946), moderate (66), and major (11).
◮ PROPERTY.DAMAGE: no (480), 1 (minor), 560 (yes).
Political Science Association, PolMeth Initiative [31]
What Causes Terrorist Groups To Use Suicide Attacks, Model Results
Dirichlet Process Model
Explanatory Variable COEF SE 95% CI
Intercept -4.105 0.559 -5.276 -3.079
YEAR - 1998 0.195 0.039 0.121 0.273
MULT.INCIDENT -0.585 0.221 -1.028 -0.162
MULTI.PARTY -0.626 0.229 -1.088 -0.189
SUSP.UNCONFIRM -0.061 0.198 -0.455 0.331
SUCCESSFUL -0.695 0.245 -1.172 -0.210
WEAPON.TYPE 1.725 0.320 1.162 2.422
TARGET.TYPE -0.038 0.185 -0.434 0.323
NUM.FATAL -0.013 0.012 -0.036 0.009
NUM.INJUR 0.017 0.004 0.008 0.025
PSYCHOSOCIAL 0.555 0.192 0.188 0.944
PROPERTY.DAMAGE 0.297 0.094 0.114 0.483
Political Science Association, PolMeth Initiative [32]
Substantive Clustering Strategy
◮ In addition to the DPP component for random effects we search for partitions ofY into clusters
Cℓ, ℓ = 1, . . . ,m, where m (the number of clusters) is an unknown parameter.
◮ Let Yℓ be a vector of length nℓ containing the Yi in cluster Cℓ, then:
Yℓ = Xℓβℓ +Aℓηℓ + ǫℓ
where:
⊲ Xℓ and Aℓ are composed of the rows corresponding to the Yi in cluster Cℓ,
⊲ unknown βℓ and σ2ℓ (where ǫℓ ∼ N
(0, σ2ℓInℓ
)) are specific to cluster Cℓ.
◮ Given a partition Nn := {1, 2, . . . , n}, C that has m < n clusters denoted by C1, . . . , Cm, the dataare a realization from a density of the form:
f (y|βC,σ2C, C) =
m∏
ℓ=1
∏
i∈Cℓ
f (yi|βℓ, σ2ℓ ) .
◮ So unlike the mixture model, this model recognizes a parameter, C, that is directly connected to
the basic clustering problem.
Political Science Association, PolMeth Initiative [33]
Substantive Clustering Strategy
◮ This model incorporates clustering in the data in two distinct ways:
⊲ it utilizes DP random effects to model unobserved heterogeneity in the data via subclusters,
⊲ the product partition model, using C, provides substantive clusters to the data that serve to
provide insights into how that data can be broken into groups that have different behavior.
◮ Note that these groupings do not nest, and so observations in the same cluster Cℓ can belong to
different subcluster defined by the columns of A (unlike Hartigan and Barry 1992).
Political Science Association, PolMeth Initiative [34]
BAAD Data
◮ Big Allied and Dangerous (BAAD) Database 1 (Asal, Rethemeyer & Anderson 2008).
◮ Assembled from several established databases: Memorial Institute for the Prevention of Terrorism’s
(MIPT) Terrorism Knowledge Base (TKB), Correlates of War (COW), Polity, and Polity2.
◮ This aggregates 395 worldwide lethal attacks from 1998-2005 by terrorist organizations.
◮ We use the version of their dataset that excludes Al Qaeda since its scope, profile, and effectiveness
place it in a unique category during this period.
◮ The variable fatalities (total number) is used as the outcome variable to focus on the primary
purpose of these attacks.
Political Science Association, PolMeth Initiative [35]
BAAD Explanatory Variables Used
◮ statespond indicates whether the group is financially or logistically supported by one or more
recognized governments (coded 1, n1 = 32), or not (coded 0, n0 = 363).
◮ masterccode denotes the COW CCODE value: where (country/region) attack took place.
◮ ordsize is size according to 0 for less than 100 members (n0 = 261), 1 for 101-1,000 members
(n1 = 77)), 2 for 1,001-10,000 members (n2 = 45), and 3 for more than 10,000 members (n = 12).
◮ terrStrong is coded 1 (n1 = 43) if they possess territory and 0 if they do not (n0 = 352).
◮ degree gives a count of alliance connections in the network sense.
Political Science Association, PolMeth Initiative [36]
More BAAD Explanatory Variables Used
◮ LeftNoReligEthno , where a 1 indicates that the group’s ideology is leftist and it is not com-
pounded with another ideological orientation (n1 = 94), and a 0 indicates that group’s ideology is
either not leftist or is a mix of leftist and at other ideological dimensions (n0 = 301).
◮ PureRelig indicates with a 1 whether the group’s ideology is purely religious and not associated
with other political or social factors (n1 = 50), and 0 otherwise (n0 = 345).
◮ PureEthno indicates with a 1 whether the group is ethnonationalist (nationalist causes tied to
ethnic identity) and not associated with other ideological factors (n1 = 26), and 0 otherwise
(n0 = 369).
◮ Islam where a 1 is assigned to groups inspired by some form of Islam (n1 = 287) and 0 otherwise
(n0 = 108).
Political Science Association, PolMeth Initiative [37]
BAAD Model Results
◮ We estimate the DPP/Product Partition model using the sampler described.
◮ The Gibbs Sampler is run for 10,000 iterations disposing of the first 5,000 as burn-in.
◮ Convergence is assessed with superdiag, a diagnostic suite provided by an R package (Tsai and
Gill 2012) that calls all of the conventional convergence diagnostics typically used (Gelman &
Rubin, Geweke, Heidelberger & Welch, Raftery & Louis).
◮ We also found no evidence of non-convergence with standard graphical tools (traceplots, cumsum
diagrams, etc.).
◮ The highest posterior probability cluster arrangement across these iterations (0.65191):
1 2 3 4
272 7 52 64
◮ Now we run a regular linear model (diffuse proper priors) with a single shared random effects and a
true multilevel linear model (diffuse proper priors) with the estimated clusters as group definitions.
Political Science Association, PolMeth Initiative [38]
Standard Linear Model Multilevel Linear Model
Mean Std.Err. 95% HPD Mean Std.Err. 95% HPD
α -0.290 1.287 [-2.811:2.232] α1 -3.835 0.843 [-5.486:-2.184]
α2 0.383 1.480 [-2.517: 3.283]
α3 -1.905 1.040 [-3.942: 0.133]
α4 19.235 1.139 [17.002:21.468]
statespond 0.514 1.193 [-1.824:2.851] 3.590 0.840 [ 1.945: 5.235]
masterccode 0.006 0.032 [-0.057:0.069] -0.054 0.019 [-0.092:-0.016]
ordsize 4.749 0.719 [ 3.339:6.159] 3.163 0.452 [ 2.277: 4.049]
terrStrong 3.849 1.355 [ 1.193:6.504] 1.886 0.974 [-0.022: 3.795]
degree 2.307 0.298 [ 1.723:2.890] 1.169 0.179 [ 0.818: 1.520]
LeftNoreligEthno 0.290 1.070 [-1.808:2.388] 0.838 0.707 [-0.548: 2.224]
PureRelig 1.131 1.307 [-1.431:3.694] 1.669 0.955 [-0.202: 3.540]
PureEthno -0.948 1.410 [-3.713:1.816] -1.378 1.045 [-3.427: 0.670]
Islam 2.851 1.203 [ 0.492:5.210] 3.179 0.857 [ 1.499: 4.858]
τ 0.009 0.001 [ 0.007:0.020] 0.027 0.002 [ 0.023: 0.031]
Summed Deviance 3002 Summed Deviance 2553
Variance Std.Dev.
σα 113.44 10.65
σy 1.31 1.15
Political Science Association, PolMeth Initiative [39]
Statistical Social Network Analysis Approaches to Understanding Terrorist Groups
◮ Mapping the social network around the 19 9/11 hijack-
ers revealed some of the outer organization. Krebs, “Mapping
Networks of Terrorist Cells.” Connections 24(3), 2002, 43-52.
◮ Cohesive subgroups and the number of hubs (central
points) in a network has an influence on the network’s
effectiveness. Pedahzur & Perliger, “The Changing Nature of Suicide At-
tacks.” Social Forces 84(4), 2006, 1987-2008.
◮ Self-learning network analyses are better describers with
covert targets. Carley & Breiger (eds.), Dynamic Network Analysis in
the Summary of the NRC workshop on Social Network Modeling and Analysis.
National Research Council.
◮ Data mining combined with SNA can reveal hidden struc-
tural patterns in large networks. Xu & Chen, “Criminal Network
Analysis and Visualization.” Communications of the ACM 48(6), 2005, 100-107.
ROMUL BONAVEN
AMBROSE
BERTH
PETER
LOUIS
VICTOR
WINF
JOHNGREG
HUGH
BONI
MARK
ALBERT
AMAND
BASIL
ELIAS
SIMP
Political Science Association, PolMeth Initiative [40]
Covert/Terror Network Analysis
◮ This is classic Social Network Anal-
ysis, except with unwilling and sur-
reptitious targets.
◮ The central goal is to determine
which actors, nodes, are important
and how they communicate with
other actors, edges.
◮ Governments also want to under-
stand the effects of removing nodes
or edges.
◮ However, the defining characteristic
of these networks is that a large
amount of data is missing.
◮ And missing data is known to be
deleterious in network analysis.
Political Science Association, PolMeth Initiative [41]
So What Are Elicited Prior Distributions?
◮ So one idea is to draw (elicit) qualitative information that helps fill-in missingness.
◮ Joint work with John Freeman (Network Science 2013, etc.).
◮ A form of prior information produced by previous knowledge from structured interviews with
subjective area experts who have little or no concern for the statistical aspects of the project.
◮ Some potential targets for elicitation:
⊲ Policy-makers/elites
⊲ diplomats
⊲ military or intelligence experts
⊲ political professionals
⊲ previous study participants
⊲ theoretical economists
⊲ historians
⊲ jurists
⊲ regulators
⊲ community leaders
◮ The actual elicitation target in this application is a set of qualitative intelligence analysts.
Political Science Association, PolMeth Initiative [42]
A New Statistical Approach To Dealing with Network Missingness
◮ For missingness:
⊲ Elicit from analysts prior distributions for attributes that describe certainty and uncertainty.
⊲ Update these prior densities regularly to account for covert network dynamics.
⊲ Aggregate elicited prior densities to obtain still better information about edge attributes.
⊲ Incorporate elicited, aggregated information about attributes into network estimation algo-
rithms to increase their power to predict covert network links.
◮ Byproduct:
⊲ Use of attribute prior densities is a new way to evaluate source validity, if attribute priors
are elicited from different units within a single research group and, eventually, from other
government agencies.
Political Science Association, PolMeth Initiative [43]
Analyst Elicitation Stage, General
◮ Suppose elicitations are on attribute
strength: xij ∈ [0 : 1] between actor
i and actor j, or just information on
either individually.
◮ Example from a real data set: xij =
0 indicates certainty that actor i and
actor j are not from the same coun-
try, and xij = 1 indicates certainty
that actor i and actor j are from the
same country.
◮ In the absence of certainty we will
replace 0 and 1 with a beta distribu-
tion, which is conveniently bounded
[0 : 1] and can take on a wide variety
of shapes.
Political Science Association, PolMeth Initiative [44]
Analyst Elicitation Stage, General
◮ Challenges that we deal with here:
⊲ obtaining elicited prior distributions must be done without technical jargon,
⊲ many elicitees should be involved,
⊲ the quality of elicitations will differ across analysts,
⊲ elicitations should be at the convenience of the elicitees.
◮ These challenges are met by providing qualitative experts with an intuitive elicitation engine, and
keeping the detailed statistical analysis away from the elicitation process.
Political Science Association, PolMeth Initiative [45]
Analyst Elicitation Stage, Query Steps
1. The analyst at a supported location logs onto the system and picks a network edge, i, j.
2. The analyst then picks an attribute, xij to assess.
3. For the selected attribute the analyst is be asked for a mean value:
“On a scale of zero to one-hundred, what is your best estimate of the strength of this
attribute?”
which gives a beta distribution mean.
4. For the variance, we could follow the PERT (Program Evaluation and Review Technique) approach
and use σx ≈ 16, but instead we use a more conservative σx ≈ 1
4 as our starting point (from
asymptotic normal distribution theory).
5. Thus we have 25% of the maximum unimodality preserving variance just a starting point for our
software “slide.”
6. The analyst is then shown graphically on the terminal the beta distribution that results from these
statements and is allowed to modify it in terms of central location and width.
Political Science Association, PolMeth Initiative [46]
Elicited Prior Specification: One Elicitation
Political Science Association, PolMeth Initiative [47]
Elicited Prior Specification: Another Elicitation
Political Science Association, PolMeth Initiative [48]
Elicited Prior Specification: And Another Elicitation
Political Science Association, PolMeth Initiative [49]
Elicited Prior Specification: Confirmation Screen
Political Science Association, PolMeth Initiative [50]
Analyst Elicitation Stage, Parametric Principles
◮ The aggregated multi-step elicited prior actually uses the general beta distribution:
f (y) =Γ(α + β)
Γ(α)Γ(β)
(y−a)α−1(b− y)β−1
(b− a)α+β−1,
where: a < y < b, α, β > 0.
◮ Here b = 100 and a = 0 for operator convenience.
◮ The general form easily reduces to the standard form with the change of variable:
x =y − a
b− a, f (x) =
Γ(α + β)
Γ(α)Γ(β)xα−1(1− x)β−1
so that 0 < x < 1, but α and β are unchanged.
◮ So in this way our mean and variance are related directly to beta distribution parameters:
µy = a + µx(b− a) µx =α
α + β
σ2y = (b− a)2σ2x σ2x =αβ
(α + β)2(α + β + 1)
Political Science Association, PolMeth Initiative [51]
Analyst Elicitation Stage, Parametric Principles
◮ Solving these equations gives:
α =
[µx(1− µx)
σ2x− 1
]
µx
β =
[µx(1− µx)
σ2x− 1
]
(1− µx)
◮ So if an elicitee provides estimates of both the mean and the variance, we can easily produce α
and β and thus fully describe the beta distribution of interest.
◮ Finally, if we restrict α ≥ 1 and β ≥ 1, then the beta distribution is guaranteed to be unimodal,
which is more intuitive and more supportable from a psychological point of view.
◮ Actually using α = 1 and β = 1 as an initial state before any elicitations is useful.
Political Science Association, PolMeth Initiative [52]
Aggregation Stage, Data Structures
◮ After a set of these elicitations we have:
α = [α1, α2, . . . , αn] β = [β1, β2, . . . , βn]
for n elicitees for each attribute of each edge.
◮ These can be organized as:[αijk,βijk
], i = 1:n, j = 1:J, k = 1:K
for i = 1:n elicitees, j = 1:J possible relationships, and k = 1:K attributes.
◮ Here K contains both individual attribute information and information on relationship attributes
for both targets designated by edge j
Political Science Association, PolMeth Initiative [53]
Aggregation Stage, Bayesian Updating
◮ The system is designed to be dynamic in that any authorized analyst can contribute at any time.
◮ Start with the original “day zero” assessment:
p1(x) ∝ xα1−1(1− x)β1−1,
which can be left as deliberately vague as desired.
◮ The distribution from the first analyst’s update is:
π1(x) ∝ p1(x)p2(x) = xα1+α2−2(1− x)β1+β2−2.
◮ So the nth update is given by:
πn(x) ∝ x
n∑
i=1αi−n
(1− x)
n∑
i=1βi−n
,
which is to say that x after update n is distributed as:
x|α,β ∼ BE
(n∑
i=1
αi − n + 1,
n∑
i=1
βi − n + 1
)
.
Political Science Association, PolMeth Initiative [54]
Link Elicitation Experiment
◮ Subjects: 63 university student par-
ticipants at the University of Min-
nesota, who are given a tutorial first.
◮ Edge elicitation: a social network in
Eastenders.
◮ Procedure: show short clip from Eas-
tenders with interacting characters.
◮ Use DVD technology to present this on the same screen with headphones.
◮ First Elicitation: a question about likelihood two actors in the clip will take a certain action.
◮ Second Elicitation show additional clip (in sequence from original) with same interacting charac-
ters.
◮ Elicit assessment again of likelihood two characters will engage in social activity, phrasing falsely
implies sisterhood.
Political Science Association, PolMeth Initiative [55]
Data Structures
◮ Define first the n×n symmetric matrix Y giving a mapping of links between n named (terrorist)
individuals.
◮ Here, yij = 1 indicates a known link between node i and node j, yij = 0 indicates the absence of
evidence for a link, and numbers inbetween come from network predictions.
◮ Now define the n × n × K array X where for each n × n relationship between individual i
and individual j, there is a K-length vector of covariate information containing: attributes for
i, attributes for j, and natural relationship attributes (CoO, training camp, sect, skills, joint
operations, relatives, etc.) between i and j.
◮ These X values could be known, or they could be unknown but possess elicited priors, in which
case the array value is place-holder for the distribution.
Political Science Association, PolMeth Initiative [56]
Exponential Random Graph Model
◮ An appealing model that relates X and Y is the random effects logistic regression specification:
p(Y|θij) =∏
i 6=j
exp(θij)
1 + exp(θij)
θij = β′Xij + zij
zij = u′iγvj + ǫij
where β is aK-length vector of coefficients to estimate, and zij is a random effects term to account
for dependencies between attribute relationships.
◮ The random effects term is broken up into components: a u′i vector of sender-specific latent or
known factors, a vj vector of receiver-specific latent or known factors, a γ diagonal matrix of
unknown coefficients, plus a ǫij scalar error specific to the edge.
◮ This last component allows for asymmetric relationships, for example Abu-Mohammed al-Maqdisi
was the mentor to Abu Musab al-Zarqawi.
◮ So we have in this model log-odds(yij = 1) = θij where the parameters of interest are β and γ,
giving the relative importance of covariates or latent factors respectively.
Political Science Association, PolMeth Initiative [57]
Full Model Specification
◮ X∗ represents the X values that are not known with certainty and given (weighted) beta priors
from our elicitation procedure,
◮ U and V, are both n×K matrices that collect the u′i and vj terms.
◮ Then, given prior distributions on the model parameters and our elicited priors, we obtain their
posterior distribution with:
p(Θ,β,X∗,U,γ,V|X,Y)︸ ︷︷ ︸
posterior distribution
∝ p(Y,X|Θ,β,X∗,U,γ,V)︸ ︷︷ ︸
joint data distribution
× p(Θ,β,X∗,U,γ,V)︸ ︷︷ ︸
prior distributions
.
◮ And this model is also estimated with Gibbs Sampling.
◮ Taking the estimated parameters we can get predictions, Y, and graph the model. . .
Political Science Association, PolMeth Initiative [58]
Updating EastEnders Network with Experimental Priors
Estimated Edge Changes (“going out later”) Between Kat Slater and Mo Harris
Political Science Association, PolMeth Initiative [59]
Another Elicitation Example: the Northern Irish “Troubles”
◮ Consider 60 well-known figures of the Provisional Irish Republican Army:
henderson app.bricklayer macbrdaigh <NA> campbell breadserver kelly van-driver
Mcdermott electrician black-dnnly various.jobs mccrudden barman fox appr.wlder.unem
forsythe wkd.at.foundry ryan sell.ap.rn.etc clarke <NA> bailey <NA>
jordan various.jobs mcparland cabinet.maker quigley student mcgrillen self-emp.lrydrvr
finucane at.flower.mill steele bakers.roundsmn mcareavey chef tolan <NA>
hall steel.erector blake van.driver donaghy <NA> carson docl.laborer
fennell appr.engineer mcgoldrick app.plumber mckinney swyrls.cstle.st delaney <NA>
rooney various.jobs hughes <NA> mcguire barman olneil <NA>
mcdermott none.listed simpson fitter.Omackies carberry heatng.enginr hannaway <NA>
kane wk.scrap.merch olneil insurance.clerk liggett <NA> burns <NA>
lennon none.listed kavanagh app.compositor olrawe docker campbell <NA>
o’callaghan lorry.driver johnston Market.Short.H mulvenna appr.jointer dempsey <NA>
Turley <NA> crossan bricklayer bryson appr.bricklyer mckenna <NA>
mckernan <NA> mccann txtle.scn.prnter0l skillen bricklayer kane furniture.bus.
mccracken <NA> lewis <NA> stone car.sprayer saunders time-mtion.clrk
◮ Other covariates include: first initial, year born, year died, year joined, age died, where from, bat-
talion, how died, where died, career trajectory, rank at death, married, children, partner pregnant,
Republican family, been in jail, sex.
◮ Starting with basic knowledge, obtain elicitations from journalism students at City University
London (thanks to Prof. Richard Collins).
Political Science Association, PolMeth Initiative [60]
Updating the PIRA Network from Journalist Elicitations
Political Science Association, PolMeth Initiative [61]
THANK YOU!
Political Science Association, PolMeth Initiative [62]
Likelihood and Estimation
◮ A general random effects Dirichlet Process model can be written
(Y1, . . . , Yn) ∼ f (y1, . . . , yn | θ, ψ1, . . . , ψn) =∏
i
f (yi|θ, ψi), ψi ∼ DP(m,φ0), i = 1, . . . , n−1
(the vector θ here is a placeholder for all of other the estimated parameters, including the β).
◮ Applying the successive conditional distributions, we can integrate over the random effects to get
the joint distribution of the data:
L(θ | y) =
ˆ
f (y1, . . . , yn | θ, ψ1, . . . , ψn)π(ψ1, . . . , ψn) dψ1 · · · dψn
=Γ(m)
Γ(m + n)
n∑
k=1
mk
∑
C:|C|=k
k∏
j=1
Γ(nj)
ˆ
f (y(j) |θ, ψj)φ0(ψj) dψj
which gives estimates of all of the desired regression parameters, and
⊲ C is a partition of the sample of size n into k groups, k = 1, . . . n− 1
⊲ y(j) is the vector of yis in subcluster j
⊲ ψj is the common parameter applied to that subcluster.
Political Science Association, PolMeth Initiative [63]
Aggregation Stage, Single Node Updates
◮ In cases where xi and xj distributions (attributes on individuals only) are given, the prior on xijis calculated by “differencing” beta distributions according to:
αxij = kαmin(αi, αj) βxij = kβ max(βi, βj)
where the individual parameter values come from the most updated priors for individuals i and
j.
◮ If the α or the β parameter pairs differ by a large amount, then the relationship attribute tends
towards a beta distribution that reflects a low relationship probability.
◮ Conversely, if there is substantial agreement in parameter values, the minimum and the maximum
will be very close together and aggregation will change the prior little.
◮ Here kα and kβ are tuning parameters that reflect management uncertainty in the node to rela-
tionship process just described.
◮ Thus we always provide a relationship assessment, xij, as input to the network model.
Political Science Association, PolMeth Initiative [64]
Reconciling Divergent Views
◮ Suppose we have beta priors for some attribute of nodes xi and xj according to xi ∼ BE(1.2, 6)and xj ∼ BE(6, 1.2) giving obviously divergent assessments.
◮ Reflecting some uncertainty, management assigns kα = 0.8 and kβ = 1.2, which is symmetric
around 1.0 but need not be.
◮ The relationship prior reflects significant skepticism about a relationship based on this resulting
beta specification as shown below
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
BE(α = 1.2, β = 6)
Node Assessment for xi
0.0 0.2 0.4 0.6 0.8 1.0
BE(α = 6, β = 1.2)
Node Assessment for xj
0.0 0.2 0.4 0.6 0.8 1.0
BE(α = 0.96, β = 7.2)
Edge Assessment for xixj
Political Science Association, PolMeth Initiative [65]
Dirichlet Process Prior Clusters Are Not Clusters
◮ A typical strategy is to use DPP models to generate a very large number of candidate “clusters,”
which are actually subclusters, then choose the best of these by a post-hoc scheme that processes
the MCMC output through some objective function to find the best grouping.
◮ This is wrong.
◮ The supposed-clusters produced by the MCMC process in repeated realizations of the Dirichlet
process are:
◮ not substantive in any way,
◮ not able to reflect any real cluster structure driven by the covariates,
◮ temporary random effect assignments to make the model fit better in the context of the sampler.
◮ Since there is no over-fitting penalty in the Dirichlet process, we can expect there to always be
more subclusters than actual substantive clusters in the data.
◮ Therefore we seek to complement the modeling approach just described with a feature that leads
to the simultaneous estimation of real clustering in the data with a product partition model.
Political Science Association, PolMeth Initiative [66]
Mixture and Product Partition Models
◮ The standard mixture model begins with the assumption that Y1, . . . , Yn are realizations of n
which are independent and identically distributed (iid) random variables within theirm-component
mixtures, giving the density:
f (y|β, ω) =m∑
ℓ=1
ωℓ f (yℓ|βℓ) ,
where m < n is a fixed positive integer, 0 ≤ ωℓ ≤ 1,∑m
ℓ=1 ωℓ = 1.
◮ An alternative, the product partition model, starts by conditioning on a given partition, and then
determines the posterior probabilities of these.
◮ Given a partition Nn := {1, 2, . . . , n}, C that has m < n clusters denoted by C1, . . . , Cm, the dataare a realization from a density of the form:
f (y|βC,σ2C, C) =
m∏
ℓ=1
∏
i∈Cℓ
f (yi|βℓ, σ2ℓ ) .
◮ So unlike the mixture model, this model recognizes a parameter, C, that is directly connected to
the basic clustering problem and is part of the estimation process.
◮ This model was developed by Hartigan (1990) (see also Barry & Hartigan 1992, Crowley 1997).
Political Science Association, PolMeth Initiative [67]
Reasons Not to Prefer the Mixture Model for Clustering
◮ Parameterization: the mixture model lacks a model parameter that defines the clusters, which
can confound standard estimation processes (McCullagh & Yang 2008, Booth, Casella and Hobert
2008).
◮ Cluster Identification: even if the mixture model parameters of the model are known, there needs
to be some way of generating a latent variable to identify clusters (McLachlan & Peel 2004).
◮ Ad Hoc Selection: the final model needs to be run with a fixed m, with the typical strategy
running a user-defined selection of m values and choosing the one with the best BIC, or similar
criteria (Si and Reiter 2013).
◮ Applications: in applied settings the data “seldom contain much information about parameters
such as the number of clusters in the population” (McCullaugh & Yang 2008).
◮ Label Switching: the mixture model is prone to the label switching problem (invariance of the
likelihood under relabeling of the mixture components), particularly in Bayesian settings (Jasra,
Holmes & Stephens 2005, Stephens 2000, Celeux 1998).
Political Science Association, PolMeth Initiative [68]
Reasons To Prefer the Product Partition Model for Clustering
◮ Computation: the product partition model partition process can be predictor-dependent and
computationally efficient (Park and Dunson 2010).
◮ Model Dimensionality: a stochastic search algorithm can be setup to move between different size
partitions at each iteration of a sampler (Booth, Casella and Hobert 2008).
◮ Cluster Identification: Contrary to the mixture model, the product partition model clearly identi-
fies the parameter that determines the cluster, and has no restriction onm, the number of clusters,
other than m < n (Crowley 1997).
◮ Label Switching: since the product partition model is label-free (the clusters are all defined by
unique partitions of Nn = {1, 2, . . . , n}), we can easily identify mappings of cases to clusters
(Hartigan 1990, Barry & Hartigan 1992).
Political Science Association, PolMeth Initiative [69]
Substantive Clustering Strategy
◮ In addition to the DPP component for random effects we search for partitions ofY into clusters
Cℓ, ℓ = 1, . . . ,m, where m (the number of clusters) is an unknown parameter.
◮ Let Yℓ be a vector of length nℓ containing the Yi in cluster Cℓ, then:
Yℓ = Xℓβℓ +Aℓηℓ + ǫℓ
where:
⊲ Xℓ and Aℓ are composed of the rows corresponding to the Yi in cluster Cℓ,
⊲ unknown βℓ and σ2ℓ (where ǫℓ ∼ N
(0, σ2ℓInℓ
)) are specific to cluster Cℓ.
◮ This model incorporates clustering in the data in two distinct ways:
⊲ it utilizes DP random effects to model unobserved heterogeneity in the data via subclusters,
⊲ the product partition model, using C, provides substantive clusters to the data that serve to
provide insights into how that data can be broken into groups that have different behavior.
◮ Note that these groupings do not nest, and so observations in the same cluster Cℓ can belong to
different subcluster defined by the columns of A (unlike Hartigan and Barry 1992).
Political Science Association, PolMeth Initiative [70]
Substantive Clustering Strategy
◮ Our goal is to find the best partition C = (C1, . . . , Cm), but the A matrix defining k subclusters
cannot be ignored.
◮ Using the DPP we want to find the posterior probability of C, marginalized over the coefficients
and random effects, which requires both integration over η and summation over the A matrices.
◮ Note that use of the DP random effects produces a correlation between individuals both within
the same cluster and in different clusters, a non-nested hierarchical specification.
Political Science Association, PolMeth Initiative [71]
Cluster Prior Probabilities
◮ Each βℓ is given a multilevel model structure with common underlying mean β0 and locally scaled
precision matrix S:
βℓ ∼ N(β0, σ
2ℓS
−1).
◮ Each cluster-specific variance parameter σ2ℓ is assigned an inverse-gamma prior with common
assigned hyperparameters:
σ2ℓ ∼ IG
(aσ2
2,bσ2
2
)
.
◮ The remaining assigned priors have the forms:
DP: φ0 ∼ N(0, τ 2
)τ 2 ∼ IG
(aτ2
2 ,bτ2
2
)
λ ∼ G(aλ2 ,
bλ2
)
PP: β0 ∼ N (0, σ2βS−1) σ2β ∼ IG
(aσ2β
2 ,bσ2β
2
)
S ∼ W(V −1, aS)
V = Diag(v1, . . . , vp) vi ∼ G(av2 ,
bv2
)C ∼???