Variational Algorithms for Marginal MAP Qiang Liu Alexander Ihler
Department of Computer Science, University of California, Irvine
AbstractMarginal MAP tasks seek an optimal configuration of the marginal distribution over a subset of variables. Marginal MAP can be computationally much harder than more common inference tasks.
We show• a general variational framework for marginal MAP problems• analogues to Bethe, tree-reweighted, & mean-field approximations• novel upper bounds via the tree-reweighted free energy• “mixed” message passing and CCCP-based solvers• conditions for global or local optimality of the solutions• close connections to EM and variational EM approaches
Variational Form
Graphical ModelsGraphical models:• Factors & exponential family form
• Factors are associated with cliques of a graph G=(V,E)
Tasks: max (B) sum (A)
Harder
Mixed inference problems can be hard even in trees, since
• A-B trees extend notion of efficient structure to mixed inference• Ensure graph structure remains a tree during inference• Two example sub-types:
sum
max
sum
max
Example from D. Koller and N. Friedman (2009)
Mixed-Inference (marginal MAP, MAP)
Sum-Inference (partition function, probability of evidence)
Max-Inference (MAP, MPE)
Variational Algorithms
Sum- product
Max- product
Match max and sum
max (B) sum (A)A ! A [ B
B ! B
B ! A
Mixed-product message passing• start with “standard” weighted message passing• Generalize zero-temperature limit results of Weiss et al. (2007)• Apply limit directly to messages ( for Bethe, for TRW)
• Match updates interpretable as a “local” marginal MAP problem• Mixed marginals satisfy a reparameterization property• Fixed points are locally optimal (similar to max-product results)• Convergence can be an issue
Double-loop algorithms• Decompose H into two parts H=H+ - H- & iteratively linearize H-
• CCCP algorithm: take H+, H- to be convex• Can also take H+ to be the Bethe approximation (non-convex)• Iteratively solve sum-product and apply truncation correction
“Type 1” “Type 2”
Connections to EM• Restrict to the mean-field like product subspace
• Coordinate-wise updates = in the primal:
• Reformulate inference as a distributional optimization problem
• Define and
Sum-Inference
Max-Inference
Mixed-Inference
Sum-inference: Mixed-inference:
(with equality when q=p) (with equality when q = p(A|B) 1(B=B*) or similar)
This work
Variational ApproximationsBethe approximation (exact on A-B tree)• “Truncated” free energy
Tree-reweighted approximation (convex comb. of A-B trees)• Dual in terms of edge appearances
ExperimentsChain graphs• GA is a tree• TRW1: type-1 only• TRW2: ½ type-1, ½ type 2• Bethe: most accurate• EM: stuck quickly (2-3 iter.)
Grid graphs• Attractive or mixed potentials • GA has cycles• Similar trends
Attractive Mixed
% correct solutions Energy relative error