Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | lukas-tencer |
View: | 556 times |
Download: | 1 times |
SUPERVISED LEARNING OF SEMANTIC CLASSES FOR IMAGE ANNOTATION AND RETRIEVAL
G. Carneiro, A. Chan, P. Moreno N. Vasconcelos
by: Lukáš Tencer
ECSE626 2012
Outline
• Introduction• Prior techniques
• Supervised OVA Labeling• Unsupervised Labeling
• Methodology• Supervised Multiclass Labeling• Semantic Distribution Estimation• Density Estimation
• Algorithm• Learning, Annotation, Retrieval
• Results• Quantitative• Qualitative
• Conclusion
Introduction
• Task• Assign labels to unknown images• Retrieve relevant images given labels
• Supervised Learning• Learning from labeled training data• Training data consist of pairs • Multiple instance learning
• Semantic Classes• labels representing common concepts (sky, bear, snow…)
• Image Annotation and Retrieval• Annotation: Given the image D, what labels are present in
the image• Given the label what are the top n matching images
nilx ii ...1 },{
Introduction
Datasets: Corel5K – 5000 images, 272 Classes Corel30K – 30000 images, 1120 Classes MIRFLICKR – 25000 images, 37 Classes (PSU) – not available anymore
ImageCLEF - The CLEF (Cross Language Evaluation Forum) Cross Language Image Retrieval Track
Medical Image retrieval Photo Annotation Plant Identification Wikipedia Retrieval Patent Image Retrieval and Classification
Introduction
Corel 5K Corel 30K MIRFLICKRBear New Zealand Urban
Prior Techniques
Supervised OVA Binary decision problem, concept present /
absent Hidden variable Yi
Decision rule: Unsupervised Learning
Modeling dependency between text label and image features, expressed as hidden variable L
Considering just positive examples, densities for Yi=1
)0()0|()1()1|( || iiii YYXYYX PXPPXP
D
l LWLXWX lPlwPlxPwxP1 ||, )(),(),(),(
L
W XW1 W2 W3 X
bear
polar, grizzly features
Methodology
Supervised Multiclass Labeling (SML) Elements of semantic vocabulary (W) are
explicitly made to semantic classes (L) ! Random var. W:
annotation and retrieval is then easy to do as:
Annotation Retrieval
)|(P and from sample is ifonly },...,1{ , W|X ixwxTiiW i
)(
)(),()|( |
| xP
iPixPxiP
X
WWXXW
)|(maxarg)(* | XiPXi XWi )|(maxarg)(* | iXPwj jWXji
???
Methodology
Estimation of Semantic Class Distributions
Given Di training set of images, estimate Assumption: Gaussian Distribution How to estimate?
Direct estimation Model Averaging Naive Averaging
GMM model:
Averaged:
)|(| ixP WX
iD
l WLXi
WX ilxPD
ixP1 ,|| ),|(
1),(
k
kli
kli
kliWLX xGilxP ),,(),|( ,,,,|
k
D
l
kli
kli
kli
iWX
i
xGD
ixP1
,,,| ),,(1
)|(
Methodology
Mixture hierarchies First step, get GMM from images –
regular soft EM
E:
M:
8
1| ),,()|(
k
kI
kI
kIWX xGIxP
InitializationEuclidian distance
Mahalonobis distance
Initial Par. estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood is too small
n
ij jjiji xGjzzxP
1
2
1),;()()|,(
)|,()|,()|,( 1 ttt zxPzxPzxP
)],;([log),(,|
ZXFEQ txz
t
),(maxarg1 tt Q
Methodology
Mixture hierarchies for label Second step, get HGMM for labels
E:
M:
64
1| ),,()|(
k
kw
kw
kwWX xGwxP Initialization
Bhattacharyya distance
Initial Par. estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood is too small
n
ij jjiji xGjzzxP
1
2
1),;()()|,(
)|,()|,()|,( 1 ttt zxPzxPzxP
)],;([log),(,|
ZXFEQ txz
t
),(maxarg1 tt Q
E and M step for HGMM
Input: Output: E-step:
M-step:
KkDj ikj
kj
kj ,...,1,,...,1},,,{
l
lc
Ntracelc
lc
kj
mc
Ntracemc
mc
kjm
jkkj
kj
lc
kj
kj
mc
eG
eGh
]),,([
]),,([
}){(2
1
}){(2
1
1
1
Mmmj
mj
mj ,...,1},,,{
KD
h
i
mjkjknewm
c
)(
jkjk
kj
mjk
kj
mjkm
jkkj
mjk
newmc h
hww
where,)(
jk
Tmc
kj
mc
kj
kj
mjk
newmc w ]))(([)(
Algorithm - learning
Training For each training set I for label w Decompose image (192px * 128px ) into 8x8
regions by sliding window moving each 2 pixels Calculate DCT for each window (8*8*3) 192-d
feature vector Calculate mixture of 8 Gaussians for each
Image using EM
Calculate mixture of 64 Gaussians for each label using H-EM
8
1| ),,()|(
k
kI
kI
kIWX xGIxP
64
1| ),,()|(
k
kw
kw
kwWX xGwxP
Algorithm – annotation, retrieval
Annotation Get n(5) beast labels for image I Get features from image ((192*128/2)*192) Get log likelihood for each label, choose the
best n
Retrieval For images IT and label w: Annotate IT and get decreasing scores of
posterior
x
iWXiWX wxPwP )|(log)|(log ||
)|(| iWX wP
Results-quantitative
Database: Corel 5k Precision: Recall:
4000 training 1000 testing
retrieved
retrievedrelevant
relevant
retrievedrelevant H
C
w
wrecall
auto
C
w
wprecision
annotated automatic
annotatedhuman
images annotatedcorrectly
auto
H
C
w
w
w
Results-quantitative
Non zero recall mean Recall mean Precision
1 2 3 4 5 6
w with Recall > 0 140 121 110 125 90 131
Mean Recall per w 0.27 0.25 0.25 0.26 0.23 0.27
Mean Precision pre w
0.25 0.24 0.23 0.23 0.2 0.23
Annotation
Results-quantitative
Recall > 0 PrecisionAll precision
1 2 3 4 5 6
Mean Recall all w 0.23 0.21 0.20 0.21 0.19 0.24
Mean Recall per w R>0
0.45 0.40 0.40 0.41 0.37 0.41
Retrieval
Results-qualitative
Results-qualitative
plane jet f-14 sky-----------------------sky plane clouds smoke snow
coast waves water hills -----------------------water sky ocean mountain clouds
polar bear bars cage -----------------------bear snow texture sunrise closeup
people cheese market street -----------------------people wall sand flower bird
Results-qualitative
Results-qualitative
Blooms Mountain Pool Smoke Woman
Results-qualitative
Conclusions
Pros Nice segmentation as byproduct of annotation Great for general concepts with lots of samples Just weakly annotated data is required (multi-instance
learning) Allows hierarchical representation (adding images, speed)
Contras Fixed number of labels per image Learning is time consuming Parameter tuning is time consuming Weakly represented classes could be associated with wrong
concepts
Resources
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 29, 394–410 (2007).
Gudivada, V.N., Raghavan, V.V.: Content based image retrieval systems. Computer. 28, 18–22 (1995).
Belongie, S., Carson, C., Greenspan, H., Malik, J.: Color-and texture-based image segmentation using EM and its application to content-based image retrieval. Computer Vision, 1998. Sixth International Conference on. pp. 675–682. IEEE (1998).
Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 71, 593–613 (2009).
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys. 40, 1-60 (2008).
Thank you for your attentionQuestions?
[email protected]://tencer.hustej.net@lukastenceraccuratelyrandom.blogspot.comfacebook.com/lukas.tencer
Google labeling game