Post on 09-Jun-2015
description
transcript
Semantics in Digital Photos: a Contenxtual
AnalysisAuthor / Pinaki Sinha, Ramesh JainConference / The IEEE International
Conference on Semantic Computing, 2008, p58. – p.65
Presenter / Meng-Lun, Wu
1
Outline
Introduction Related Work The Optical Context Layer Photo Clustering Photo Classification Annotation in Digital Photos Results Conclusion
2
Introduction
Most research is concerned with extracting semantics using content information only.
All search engines rely on the text associated with the images to search for images.
Authors fuse the content of photos with two type of context using a probabilistic model.
3
Introduction (cont.)
4
Introduction (cont.)
This paper classify photos into mutually exclusive classes and automatically tagging new photos.
Authors collected the photo dataset from flickr, which publishes popular tags.
5
Related Work
Most research use content based pixel features like global features or local features.
Image search using an example input image or query using low level features might be difficult and no intuitive to most people.
Correlations among image features and human tags or labels have been studied.
The semantic gap in image retrieval can’t be overcome using pixel features alone.
6
Related Work (cont.)
Recent research has used the optical Context Layer to classify photos.
Boutell and Luo[3] use pixel values and optical metadata for classification.[3] M. Boutell and J. Luo. Bayesian fusion of camera
metadata cues in semantic scene classification. In Proc. IEEE CVPR, 2004.
Model[6] by fusing ontology.[6] P.Duygulu, K.Barnard, N. de Freitas, and D. Forsyth.
Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. ECCV, 2002.
7
The Optical Context Layer
The Exchangeable Image File Standard (EXIF) specifies the camera parameters recorded.
Fundamental parameters Exposure Time, Focal Length, F-number,
Flash, Metering mode and ISO.
8
Photo Clustering
LogLight metric will have a small value when the ambient light is high.
Similarly it will have a large value if the outdoor light is small.
)lg( 2FLISOAAETKtricLogLightMe
9
Photo Clustering (cont.)
Log-Light distribution of photos shot with flash and without flash as a mixture of Gaussians.
Use the Bayesian model Selection to find the optimal model and the Expectation Maximization (EM) algorithm to fit the model parameters.
10
Image
?Optic
al
Photo Clustering (cont.)
According to the above method, we generated 8 clusters.
We choose 3500 tagged photos. We find the probability of each photo. We assign the photo to the cluster
having maximum probability. We assign all tags of the photo to
that particular cluster.
11
Photo Clustering (cont.)
Cluster with High Exposure Time Shots
Cluster with No Flash
12
Photo Clustering (cont.)
Cluster with Indoor Shots
13
Photo Classification
The intent of the photographer is somehow hidden in the optical data.
These classes are outdoor day, outdoor night and indoors.
The classes should be represented different lighting condition in the LogLight metric.
14
Photo Classification (cont.)
The Classification problem using Optical Context only and also using Optical Context and Thumbnail pixel features.
Classification algorithms is decision trees.
15
Photo Classification (cont.)
16
Annotation in Digital Photos
The goal for automatic annotation is to predict words for tagging untagged photos.
Relevance model approach has become quite popular for automatic annotation and retrieval of images.
Automatic annotation is modeled as a language translation problem.
The baseline is continuous relevance model(CRM).
17
Annotation in Digital Photos (cont.)
We divided the whole image into rectangular blocks.
For each block, we compute color, texture and shape features.
Each feature vector has 42 dimensions.
18
Annotation in Digital Photos (cont.)
The goal is to predict the W associated with an untagged image based on B.
B is the observed variable. The conditional probability of a word
given a set of blocks.
19
Annotation in Digital Photos (cont.)
During clustering process, we learn the optical cluster using an untagged image.
Whenever a new image X comes, we assign it to the cluster Oj having maximum value for P(X|Oj).
The probability of a word given the pixel feature blocks and the optical context information.
20
Results
Experiments datasets – Flickr Train Evaluation Test
Performance evaluation Precision recall
The number of correctly tag.
The number of photos annotated with that tag in the real data.
The number of prediction tag.
21
Results ( cont. )
Prediction tag – wildlife Optical Context 0.71 Image Features (CRM) 0.16 Thumbnail-Context 0.44
22
Using Ontology to Improve Tagging
CIDE word similarity ontology. Wu Palmer distance between two
tags
)()(
)(*2),(
ydxd
pdyxSim
23
Using Ontology to Improve Tagging
Shrink this estimate using semantic similarity:
),()1()|()|( WSimIWPIWP MLE
24
Results (cont.)
25
Conclusion
Optical context data is only a small fraction, which has invaluable information about the photo shooting environment.
Fusing ontological models on semantics about photos also improves precision.
The future work Fuse other types of context with the
context and optical context features.
26