Human Parsing using Stochastic And-Or Grammars and Rich … · 2019. 3. 31. · APPEARANCE MODEL...

Brandon Rothrock and Song-Chun ZhuUCLA Dept. of Computer Science

Human Parsing using Stochastic And-Or Grammars and Rich Appearances

Thursday, November 17, 11

MOTIVATIONS

• Rich appearance representation

• Semantic parts

• Part sharing

• Reconfigurable parts


BODY REPRESENTATION

layer 1 layer 2 layer 3 layer 4


COMPOSITION RULES

Part form Part types

and-node or-nodes


AND-OR GRAPH GRAMMAR

1

5

11 75

11

6

16

1

1 5 11105 87

or-node group

and-node



1

5

11 75

11

6

16

1

1 5 11105 87

or-node group

and-node

Grammar Derivation tree



1

5

11 75

11

6

16

1

1 5 11105 87

or-node group

and-node

Grammar Parse Graph


PROBABILITY MODEL

appearance derivation(part forms)

geometry=

1

Z(⇥)

exp {�Ea(pg|I)� Ed(pg)� Eg(pg)}


APPEARANCE MODEL• Adapted from Active Basis model

(IJCV 2010) Learning Active Basis Model for Object Detection and Recognition (Wu, Si, Gong, Zhu)(CVPR 2009) Learning Mixed Image Templates for Object Recognition (Si, Gong, Wu, Zhu)

Model derived by minimizing KL

Resulting in the following form:

Template learned by pursuit


APPEARANCE MODEL


APPEARANCE MODEL


leg

APPEARANCE MODELupper body

upper arm

lower body


APPEARANCE MODEL


DERIVATION MODEL

SCFG case is indifferent to neighboring forms

form index

p(pg) =Y

vi2Vpg

p(!(vi))


DERIVATION MODEL

children

Allow child forms to depend on parent form

pd(pg) =Y

vi2Vpg

p(!(C(vi))|!(vi))

⇡Y

vi2Vpg

Y

(jk)2E(vi)

p(!(vj),!(vk)|!(vi))


GEOMETRY MODEL

p(vj , vk|!(vi)) = N (Tkj(xk)� Tjk(xj), 0,⌃ij)

pg(pg) /Y

vi2Vpg

Y

(jk)2E(vi)

p(vj , vk|!(vi))


SAMPLINGconstrained samples

unconstrained samples


INFERENCERecursive scoring function


INFERENCE

⇢MAPS = max

x,!(⇢S)s⇤(⇢S , I)

s⇤(⇢, I) = s(⇢, I) + max

xi,!(⇢i)8⇢i2C(⇢)

0

@X

⇢i2C(⇢)

s⇤(⇢i, I)

1

A


INFERENCEmax

xi,!(⇢i)8⇢i2⇢C

X

⇢i2⇢C

s⇤(⇢i, I)

!=

B⇢j (⇢i) = max

!(⇢j)

0

@max

xj

0

@s⇤(⇢j) + log

p(!(⇢i),!(⇢j))

p(!(⇢j))d(pj)�1+ log p(⇢i, ⇢j) +

X

⇢k2C(⇢j)

B⇢k(⇢j)

1

A

1

A

⇢i ⇢j


hand

ua

la

ua la hand

arm


arm ua

la

hand

arm

arm+children

ua la hand

arm


lbody+children

ubody+children fullbody+children

fullbodyubody



PERFORMANCE

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

detection threshold

PCP

detec

tion r

ate

ULUATorsoLLLAHeadHandFoot

head torso u.leg l.leg u.arm l.arm hand foot avg

Method of Yang et al. (CVPR11) 1.000 1.000 0.975 0.839 0.951 0.577 0.869

Our method 1.000 1.000 0.933 0.857 0.915 0.719 0.420 0.339 0.884


CONCLUSIONS

• Generative model for representing rich appearance of highly deformable objects

• Capture semantic relationships between neighboring part productions

• DP framework for computing exact inference


THANKS!


Date post:	01-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Human Parsing using Stochastic And-Or Grammars and Rich … · 2019. 3. 31. · APPEARANCE MODEL...

Documents