THE LEARNING AND USE OF GRAPHICAL MODELS FOR IMAGE
INTERPRETATION
Thesis for the degree of Master of ScienceBy Leonid Karlinsky
Under the supervision of Professor Shimon Ullman
Introduction
GraphicalModels
A
B C
D E
P(B|A) P(C|A)
P(D|B) P(E|B)
Bayesian Network (BN)
iX
ii XXPEDCBAP )|(),,,,(
Introduction
GraphicalModels
Learning
Loop FreeLoopy
Using
Loop FreeLoopy
Tasks
Scenarios
Part I:Part I: MaxMI Training
GraphicalModels
Learning
Loop FreeLoopy
Using
Loop FreeLoopy
Tasks
Scenarios
Classification
ClassC
Features
Goal: Classify C, using a subset of “trained” featuressubset of “trained” features - F on new examples with minimum error
f
Training tasks:• Best F• Best• Efficient model
F2
F1F3
F6F5
);,( FCP
1F2F
4F 3F
5F 6F
7F
More…);,( FCP
Best = Maximal MI
MaxMI Training - The Past
• Model: simple “Flat” structure, NCC thresholds
• Training: Features and thresholds selected one by one
);;(),;,;(minmaxarg,
,jjjiji
ijFii FCMIFFCMIF
ii
1 2 3 4 5 6
Cond. independence in C increased MI upper bound
More…
MaxMI Training – Our ApproachMaxMI Training – Our Approach
1
2 3
4 5 6 7
Learn modelmodel and allall togethertogether maximizing:i
);;( FCMI
MaxMI Training – Learning
MaxMI: Decompose MI
1
2 3
4 5 6 7
i);;( FCMI
iiiii FCFMIFCMI ),,|;();;(
EfficientlyEfficiently learn parameters using GDLGDL
Maximize
for allall together
More…
MaxMI Training – Assumptions
1.1. TAN model structureTAN model structure – Tree Augmented Naïve Bayes [Friedman, 97]
2.2. Feature Tree (FT)Feature Tree (FT) – can remove C preserving the feature tree.
i
iin CFFPFFCP ),|(),,,( 1
i
iin FFPFFP )|(),,( 1
iiiii FCFMIFCMI ),,|;();;(
MaxMI Training – TAN and
1.1. TAN structure is unknownTAN structure is unknown
2. Learn and TAN s.t.: 1
2 3
4 5 6 7
i
);;( FCMI is maximized. Asymptotic correctness FT holds Efficiency
MaxMI Training – MaxMI hybridMaxMI Training – MaxMI hybrid
,TAN
);;( argmax TAN,
FCMI
,TAN Legal
)|;(),( iiiiMM FCFMIFFwMI
C)|FP(F)F,FP(C
)|FP(F)F,P(F-
iin
-iin
,, TAN, Legal
1
1
MaxMI Training – MaxMI hybridMaxMI Training – MaxMI hybrid
More…
C)|FP(F)F,FP(C
)|FP(F)F,P(F-
iin
-iin
,, TAN, Legal
1
1
)|FP(F)F,P(FFF -iinii 1FT
TAN, ),(w argmax
[Chow & Liu, 68]
);;( argmax ),(w argmax TAN, Legal
MM TAN, Legal
FCMIFF ii MaxMI:MaxMI:
maximal is MI Legal are TAN,
),(w),(w argmax
?
FTMM TAN,
iiii FFFF
C)|FP(F)F,FP(CFF -iinii ,, ),(w argmax 1TAN
TAN,
[Friedman, 97]
);(),(w
),(w),(w
TAN
FTMM
CFMIFF
FFFF
iii
iiii
MaxMI Training – MaxMI hybrid
Convergent algorithm:
),(),( iiFTiiMM FFwFFw ),( iiTAN FFw
TAN
More…
);(),(w
),(w),(w
TAN
FTMM
CFMIFF
FFFF
iii
iiii
MaxMI Training – empirical results
0102030405060708090
100110120130140150160170180
LMI MaxMI
Cow Parts Model - Feature Centered - Test DB (2256)
Miss FA Total Errors
More…
0102030405060708090
100110120130140
LMI MaxMI
Cow Parts Model - Parent Centered - Test DB (2256)
Miss FA Total Errors
MaxMI Training – empirical results
0102030405060708090
100110120130140150160170180
LMI MaxMI MaxMI+TAN
Cow Parts Model - Classification Errors - Test DB (2256)
Miss FA Total Errors More…
MaxMI Training – Generalizations
Train any parametersTrain any parameters i Any low-TREEWIDTH structureAny low-TREEWIDTH structure Even without assumptions:
iiiii FCFMIFCMI ),,|;();;(
C)|FP(F)F,FP(C
)|FP(F)F,P(F-
iin
-iin
,, TAN, Legal
1
1
iiiiiiiii FCFHFFHFCMI ),,,|()ˆ,,ˆ|();;(
Back to the Goals
GraphicalModels
Learning
Loop FreeLoopy
Using
Loop FreeLoopy
Tasks
Scenarios
Part II:Part II: Loopy MAP approximation
GraphicalModels
Learning
Loop FreeLoopy
Using
Loop FreeLoopy
Tasks
Scenarios
Loopy network example
x
x x x x
),( 2112xxf ),(
5115 xxf),( 3
113
xxf ),(4
114 xxf
),( 4224 xxf ),( 5335 xxf
),( 5225 xxf
2 3
1
4 5
Want to solve MAP:
nji
jiijxx
n xxfxxn 1},,{
1 ),(argmax},,{1
NP-hard in general! [Cooper 90, Shimony 94]
Our approach – opening loops
x
x x x x
),( 2112xxf ),(
5115 xxf),( 3
113
xxf ),(4
114 xxf
42
24
53
35
52
25
2 3
1
4 5
z 4 z3z 2
Now, we can maximizemaximize:
l
ll
lmkk kjkjk
kjijiij
zzzxn xzfxxfxx ),(),(argmax},,{
,},,{,1
1
The assignment is legallegal for the loopy problem if:ll kk xz
Our approach – opening loops
),,(argmax},,{,,
zxygzxyxzyx
lll
LegallyLegally maximize:
),,( zxyg
),,(argmax},,{,,
zxygzxyzyx
mmm Can maximize unrestricted:
Usually mm zx
Our solution – slow connectionsslow connections
Our approach – slow connectionsOur approach – slow connections
),,(argmax},{,
Zxygxyyx
ZZ
Fix z=Z
ZxZ Now legalizelegalize and return to step one.
Iterate until convergence. This is the Maximize-and-LegalizeMaximize-and-Legalize algorithm. y
x x x y2 3
1
4 5
z 4 z3z 2
4Z 2Z 3Z
Zx )( 2
Zx )( 3
Zx )( 4
Zy )( 1
Zy )( 5
MaximizeMaximize (loop-free, use GDL):
Our approach – slow connections
When will this work?When will this work?
The intuition:The intuition: z-minor
Strong Strong zz-minor-minor
Weak Weak zz-minor-minor
),,(),,(),,(),,(:, ZxygZZygZxygxxygZxy ZZZZZZZZ
),,(),,(),,(),,(:),(),( ZxygZxygZxygzxygxyxy ZZZZZZZZ
global maximumglobal maximum – single stepsingle step
local optimumlocal optimum – several stepsseveral steps
),,( Zxyg ZZ
),,( zxyg ZZ),,( Zxyg
),,( Zxyg ZZ
),,( ZZZ xxyg),,( ZZyg
Making the assumptions trueSelecting z-variablesSelecting z-variables The intuitionThe intuition:: recursive z-selection
Recursive strong strong zz-minor-minor: single step, global maximum! Recursive weak z-minor: iterations, local maximum.
Different / Same speed
Remove – Contract – Split algorithm More…
slow connection
1x
2x 3x3z
slow connection
1x
2x 3x
1z4x 4x
slow connection
1x
2x 3x
1z4x
Making the assumptions true
Approximating the functionApproximating the function The intuitionThe intuition:: recursively “chip away” small parts of the function
More…
slow connection
slow connection
1x
2x 3x),( 322 xxf
1x
2x 3x),( 322 xxf
1z
1z
slow connection
1x
2x 3x),( 322 xxf
1z
Existing approximation algorithms
ClusteringClustering: triangulation [Pearl, 88]
Loopy Belief Revision Loopy Belief Revision [McEliece, 98][McEliece, 98]
Bethe-Kikuchi Free-EnergyBethe-Kikuchi Free-Energy: CCCP [Yuille, 02]
Tree Re-Parametrization (TRP)Tree Re-Parametrization (TRP) [Wainwright, 03] [Wainwright, 03]
Experimental Results
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Weak z-
minor,
diff erent
speed
Weak z-
minor, same
speed
Random z-
variables
selection
LBR (50
messages)
LBR (10
messages)
Ignore
Siblings
1000 samples, 31 nodes, 4 values
Avarage Approximation Avarage MatchMore…
Experimental Results
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Weak z-
minor,
diff erent
speed
Weak z-
minor, same
speed
Random z-
variables
selection
LBR (50
messages)
LBR (10
messages)
Ignore
Siblings
1000 samples, 31 nodes, 2 values
Avarage Approximation Avarage MatchMore…
More…More…
Maximum MI Maximum MI vs. vs.
Minimum PMinimum PEE
More…
Classification Specifics• How do we classify a new example?
);|(maxarg FCPCC
• What are “the best” features and parameters?
);;(maxarg,,
FCMIFF
• Why maximize MI?
MAP:
Maximize MI:
More reasons – if time permits
Tightly related to PE
Back…
)|()();( FCHCHFCMI
MaxMI Training - The Past - Reasons
• Why did it work? Conditional independence in C
• What was missing?
);;(),;,;(min);;(),;,;( jjjijiij
ii FCMIFFCMIFCMIFFCMI
Increased MI upper bound
Conditional independence in Conditional independence in C C was assumed!was assumed!
1 2 3 4 5 6
Maximizing the “whole” MI.Maximizing the “whole” MI. Learning model structure.Learning model structure.
Back…
MaxMI Training – JT
JTJT structure = TANTAN structure
GDL - exponential in TREEWIDTHTREEWIDTH
),,|;(
iiii FCFMI
TREEWIDTH = 2
Back…
MaxMI Training – EM
Why not EM?Why not EM?
EM assumes static training datastatic training data!
Not true in our scenario!Not true in our scenario!
[Redner, Walker, 84] EM algorithm:
Training CPTs with EM
Yy
yp );(logmaxarg
Back…
MaxMI Training – MaxMI hybrid solution
[Chow, Liu 68] “Best” Feature Tree
),;;(),( iiiiiiFT FFMIFFw
[Friedman, et al. 97] “Best” TAN
),;|;(),( iiiiiiTAN CFFMIFFw
Back…
[We, 2004] Maximal MI
),;|;(),( iiiiiiMM FCFMIFFw
MaxMI Training – MaxMI hybrid solution
);;();();();( iijiTANjiFTjiMM CFMIFFwFFwFFw
FTMM ScoreScore maxarg
MMScore Increase: ? ICR
TANScoreTAN maxarg
Non-decrease: TAN Asymptotic correctness
i
iiTANFTMM CFMIScoreScoreScore );;(
FTMM ScoreScore
Back…
MaxMI Training – MaxMI hybrid
Back…
);(
),(w)1(),(w
),(w),(w
FTTAN
FTMM
CFMI
FFFF
FFFF
i
iiii
iiii
MaxMI Training – empirical results
Before training:
After training:
Back…
MaxMI Training – empirical results
0102030405060708090
100110120130140150
MaxMI Original
Training
MaxMI+TAN
constrained
MaxMI+TAN
greedy
MaxMI+TAN
hybrid
MaxMI
(threshold only)
MaxMI+TAN
O&U soft EM
Face Parts Model - Classification Errors - Test DB (2257)
Miss FA Total Errors Back…
MaxMI Training – empirical results
Face Parts ModelTest DB Size
Training DB Size
Class entropy on training DB
MI model to class on training DB
Error rate on test DB
Error rate on training DB
MaxMI Training22577670.7926908340.75824246413525
Original Training22577670.7926908340.72242935213635
MaxMI Training with constrained TAN restructure22577670.7926908340.756855168
Miss=62, FA=36
Miss=15, FA=3
MaxMI Training with greedy TAN restructure22577670.7926908340.746516913
Miss=30, FA=44
Miss=16, FA=3
Alternative MaxMI Training with TAN restructure22577670.7926908340.74711484
Miss=33, FA=109N / A
Threshold only training (without restructure)22577670.7926908340.738676981
Miss=84, FA=46
Miss=30, FA=5
Observed & Un-observed model training constructed from the all-observed model and soft EM22577670.792690834N / A67N / A
Back…
MaxMI Training – empirical results
Cow Parts ModelTest DB Size
Training DB Size
Class entropy on training DB
MI model to class on training DB
Error rate on test DB
Error rate on training DB
Original Training22569610.46535663N / AMiss=84, FA=64
Miss=36, FA=16
MaxMI Training22569610.46535663N / AMiss=53, FA=42
Miss=25, FA=17
MaxMI Training with constrained TAN restructure22569610.46535663N / A
Miss=32, FA=48
Miss=17, FA=12
MaxMI Training with greedy TAN restructure22569610.46535663N / A
Miss=59, FA=30
Miss=23, FA=16
Observed & Un-observed model training constructed from the all-observed model and trained using soft EM22569610.46535663N / A89N /A
Back…
Remove – Contract – SplitRemove – Contract – Split
Back…
Making the assumptions true
Approximating the functionApproximating the function Strong Strong zz-minor-minor
Challenge:Challenge: selecting proper Z constants Benefit:Benefit: single step convergence
Weak Weak zz-minor-minor
Drawback:Drawback: exponential in number of “chips” Benefit:Benefit: less restrictive
Back…
),(),()1(),(),(
),(),()1(),(
),(),(
2121211
11
),(
1
1
zygxygZygxyf
zygxygxyf
xygxyf
xyg
The clique treeThe clique tree
Back…
)( ii vf )( jj vf
),(log jiij vvw
C1
C2 C4C3
iv jv
Ck
Experimental Results
Model SizeNode Count
Value Count
Sample Count
A2 (same "slow" speed)
Average Approximation
Average Mismatch
Average Match )%(
Depth=3, Branching=5
314100094.11%15-1650.31%
Depth=3, Branching=5
313100094.55%11-1263.70%
Depth=3, Branching=5
312100097.16%4-584.60%
Based on Natural feature trees, 4
cliques of size 7252~200098.34%1-293.62%
Model SizeNode Count
Value Count
Sample Count
A2 (different "slow" speed)
Average Approximation
Average Mismatch
Average Match )%(
Depth=3, Branching=5
314100098.26%10-1165.22%
Depth=3, Branching=5
313100098.08%7-874.51%
Depth=3, Branching=5
312100098.55%3-488.62%
Based on Natural feature trees, 4
cliques of size 7252~200097.85%3-486.14%
Experimental ResultsModel Size
Node Count
Value Count
Sample Count
Random Slow Connections
Average Approximation
Average Mismatch
Average Match )%(
Depth=3, Branching=5
314100082.70%20-2134.58%
Depth=3, Branching=5
313100081.52%16-1745.48%
Depth=3, Branching=5
312100079.37%11-1262.23%
Based on Natural feature trees, 4
cliques of size 7252~2000N/AN/AN/A
Model SizeNode Count
Value Count
Sample Count
Loopy Belief Revision (50 messages per node)
Average Approximation
Average Mismatch
Average Match )%(
Depth=3, Branching=5
3141000N/AN/AN/A
Depth=3, Branching=5
313100089.17%13-1455.31%
Depth=3, Branching=5
312100088.73%8-972.80%
Based on Natural feature trees, 4
cliques of size 7252~200093.34%3-487.73%
Experimental ResultsModel Size
Node Count
Value Count
Sample Count
Loopy Belief Revision (10 messages per node)
Average Approximation
Average Mismatch
Average Match )%(
Depth=3, Branching=5
314100087.65%17-1841.95%
Depth=3, Branching=5
313100086.74%14-1554.02%
Depth=3, Branching=5
312100085.78%8-971.80%
Based on Natural feature trees, 4
cliques of size 7252~2000N/AN/AN/A
Model SizeNode Count
Value Count
Sample Count
Ignore Sibling Loopy Links
Average Approximation
Average Mismatch
Average Match )%(
Depth=3, Branching=5
314100074.04%21-2229.25%
Depth=3, Branching=5
313100071.89%19-2038.56%
Depth=3, Branching=5
312100069.38%13-1456.09%
Based on Natural feature trees, 4
cliques of size 7252~200073.45%9-1063.88%
Back…
MaxMI Training – extensions Observed and unobserved model.
MaxMI augmented to support O&U Training observed only + EM heuristic.
Complete training Constrained and greedy TAN restructure. MaxMI vs. MinPE in ideal scenario –
characterization and comparison. Future research directions
MaxMI vs. MinPE
MinPE:
F
FstructureF
FFCPCPF
);),|(maxarg(minargmodel,,
MaxMI:);;(maxargmodel
,,F
structureFFCMI
F
Fano & inverse Fano (binary C):
)();|( EF PHFCH
);|(2
1FE FCHP
Back…
MaxMI vs. MinPE – ideal scenario
MinPE:
MaxMI:
Setting: n-valued C, k-valued F.
Arrange: Select F:
)()1( ,2 iCPiCPni
)|(argmaxj ,1 jFiCPkji
Divide:
Select F:
)(argmax},,{
1},,1{
)()(1 AHAA
k
jj
jj
An
ACPAAPk
0)|(
)(
)()|( ,1
jFAiCP
AP
iCPjFAiCPkj
j
jj
Back…
MaxMI vs. MinPE – ideal scenario
In general MaxMI MinPE
In special cases MaxMI MinPE
With increase in number of guesses:
Implications:Implications:
MaxMIMinPE
Back…