Brandon Rothrock and Song-Chun ZhuUCLA Dept. of Computer Science
Human Parsing using Stochastic And-Or Grammars and Rich Appearances
Thursday, November 17, 11
MOTIVATIONS
• Rich appearance representation
• Semantic parts
• Part sharing
• Reconfigurable parts
Thursday, November 17, 11
BODY REPRESENTATION
layer 1 layer 2 layer 3 layer 4
Thursday, November 17, 11
COMPOSITION RULES
Part form Part types
and-node or-nodes
Thursday, November 17, 11
AND-OR GRAPH GRAMMAR
1
5
11 75
11
6
16
1
1 5 11105 87
or-node group
and-node
Thursday, November 17, 11
AND-OR GRAPH GRAMMAR
1
5
11 75
11
6
16
1
1 5 11105 87
or-node group
and-node
Grammar Derivation tree
Thursday, November 17, 11
AND-OR GRAPH GRAMMAR
1
5
11 75
11
6
16
1
1 5 11105 87
or-node group
and-node
Grammar Parse Graph
Thursday, November 17, 11
PROBABILITY MODEL
appearance derivation(part forms)
geometry=
1
Z(⇥)
exp {�Ea(pg|I)� Ed(pg)� Eg(pg)}
Thursday, November 17, 11
APPEARANCE MODEL• Adapted from Active Basis model
(IJCV 2010) Learning Active Basis Model for Object Detection and Recognition (Wu, Si, Gong, Zhu)(CVPR 2009) Learning Mixed Image Templates for Object Recognition (Si, Gong, Wu, Zhu)
Model derived by minimizing KL
Resulting in the following form:
Template learned by pursuit
Thursday, November 17, 11
APPEARANCE MODEL
Thursday, November 17, 11
APPEARANCE MODEL
Thursday, November 17, 11
leg
APPEARANCE MODELupper body
upper arm
lower body
Thursday, November 17, 11
APPEARANCE MODEL
Thursday, November 17, 11
DERIVATION MODEL
SCFG case is indifferent to neighboring forms
form index
p(pg) =Y
vi2Vpg
p(!(vi))
Thursday, November 17, 11
DERIVATION MODEL
children
Allow child forms to depend on parent form
pd(pg) =Y
vi2Vpg
p(!(C(vi))|!(vi))
⇡Y
vi2Vpg
Y
(jk)2E(vi)
p(!(vj),!(vk)|!(vi))
Thursday, November 17, 11
GEOMETRY MODEL
p(vj , vk|!(vi)) = N (Tkj(xk)� Tjk(xj), 0,⌃ij)
pg(pg) /Y
vi2Vpg
Y
(jk)2E(vi)
p(vj , vk|!(vi))
Thursday, November 17, 11
SAMPLINGconstrained samples
unconstrained samples
Thursday, November 17, 11
INFERENCERecursive scoring function
Thursday, November 17, 11
INFERENCE
⇢MAPS = max
x,!(⇢S)s⇤(⇢S , I)
s⇤(⇢, I) = s(⇢, I) + max
xi,!(⇢i)8⇢i2C(⇢)
0
@X
⇢i2C(⇢)
s⇤(⇢i, I)
1
A
Thursday, November 17, 11
INFERENCEmax
xi,!(⇢i)8⇢i2⇢C
X
⇢i2⇢C
s⇤(⇢i, I)
!=
B⇢j (⇢i) = max
!(⇢j)
0
@max
xj
0
@s⇤(⇢j) + log
p(!(⇢i),!(⇢j))
p(!(⇢j))d(pj)�1+ log p(⇢i, ⇢j) +
X
⇢k2C(⇢j)
B⇢k(⇢j)
1
A
1
A
⇢i ⇢j
Thursday, November 17, 11
hand
ua
la
ua la hand
arm
Thursday, November 17, 11
arm ua
la
hand
arm
arm+children
ua la hand
arm
Thursday, November 17, 11
lbody+children
ubody+children fullbody+children
fullbodyubody
Thursday, November 17, 11
Thursday, November 17, 11
PERFORMANCE
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
detection threshold
PCP
detec
tion r
ate
ULUATorsoLLLAHeadHandFoot
head torso u.leg l.leg u.arm l.arm hand foot avg
Method of Yang et al. (CVPR11) 1.000 1.000 0.975 0.839 0.951 0.577 0.869
Our method 1.000 1.000 0.933 0.857 0.915 0.719 0.420 0.339 0.884
Thursday, November 17, 11
CONCLUSIONS
• Generative model for representing rich appearance of highly deformable objects
• Capture semantic relationships between neighboring part productions
• DP framework for computing exact inference
Thursday, November 17, 11
THANKS!
Thursday, November 17, 11