Semantics In Digital Photos A Contenxtual Analysis

transcript

Semantics in Digital Photos: a Contenxtual

AnalysisAuthor / Pinaki Sinha, Ramesh JainConference / The IEEE International

Conference on Semantic Computing, 2008, p58. – p.65

Presenter / Meng-Lun, Wu

Outline

Introduction Related Work The Optical Context Layer Photo Clustering Photo Classification Annotation in Digital Photos Results Conclusion

Introduction

Most research is concerned with extracting semantics using content information only.

All search engines rely on the text associated with the images to search for images.

Authors fuse the content of photos with two type of context using a probabilistic model.

Introduction (cont.)

This paper classify photos into mutually exclusive classes and automatically tagging new photos.

Authors collected the photo dataset from flickr, which publishes popular tags.

Related Work

Most research use content based pixel features like global features or local features.

Image search using an example input image or query using low level features might be difficult and no intuitive to most people.

Correlations among image features and human tags or labels have been studied.

The semantic gap in image retrieval can’t be overcome using pixel features alone.

Related Work (cont.)

Recent research has used the optical Context Layer to classify photos.

Boutell and Luo[3] use pixel values and optical metadata for classification.[3] M. Boutell and J. Luo. Bayesian fusion of camera

metadata cues in semantic scene classification. In Proc. IEEE CVPR, 2004.

Model[6] by fusing ontology.[6] P.Duygulu, K.Barnard, N. de Freitas, and D. Forsyth.

Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. ECCV, 2002.

The Optical Context Layer

The Exchangeable Image File Standard (EXIF) specifies the camera parameters recorded.

Fundamental parameters Exposure Time, Focal Length, F-number,

Flash, Metering mode and ISO.

Photo Clustering

LogLight metric will have a small value when the ambient light is high.

Similarly it will have a large value if the outdoor light is small.

)lg( 2FLISOAAETKtricLogLightMe

Photo Clustering (cont.)

Log-Light distribution of photos shot with flash and without flash as a mixture of Gaussians.

Use the Bayesian model Selection to find the optimal model and the Expectation Maximization (EM) algorithm to fit the model parameters.

?Optic

According to the above method, we generated 8 clusters.

We choose 3500 tagged photos. We find the probability of each photo. We assign the photo to the cluster

having maximum probability. We assign all tags of the photo to

that particular cluster.

Cluster with High Exposure Time Shots

Cluster with No Flash

Cluster with Indoor Shots

Photo Classification

The intent of the photographer is somehow hidden in the optical data.

These classes are outdoor day, outdoor night and indoors.

The classes should be represented different lighting condition in the LogLight metric.

Photo Classification (cont.)

The Classification problem using Optical Context only and also using Optical Context and Thumbnail pixel features.

Classification algorithms is decision trees.

Photo Classification (cont.)

Annotation in Digital Photos

The goal for automatic annotation is to predict words for tagging untagged photos.

Relevance model approach has become quite popular for automatic annotation and retrieval of images.

Automatic annotation is modeled as a language translation problem.

The baseline is continuous relevance model(CRM).

Annotation in Digital Photos (cont.)

We divided the whole image into rectangular blocks.

For each block, we compute color, texture and shape features.

Each feature vector has 42 dimensions.

The goal is to predict the W associated with an untagged image based on B.

B is the observed variable. The conditional probability of a word

given a set of blocks.

During clustering process, we learn the optical cluster using an untagged image.

Whenever a new image X comes, we assign it to the cluster Oj having maximum value for P(X|Oj).

The probability of a word given the pixel feature blocks and the optical context information.

Results

Experiments datasets – Flickr Train Evaluation Test

Performance evaluation Precision recall

The number of correctly tag.

The number of photos annotated with that tag in the real data.

The number of prediction tag.

Results ( cont. )

Prediction tag – wildlife Optical Context 0.71 Image Features (CRM) 0.16 Thumbnail-Context 0.44

Using Ontology to Improve Tagging

CIDE word similarity ontology. Wu Palmer distance between two

)(*2),(

pdyxSim

Using Ontology to Improve Tagging

Shrink this estimate using semantic similarity:

),()1()|()|( WSimIWPIWP MLE

Results (cont.)

Conclusion

Optical context data is only a small fraction, which has invaluable information about the photo shooting environment.

Fusing ontological models on semantics about photos also improves precision.

The future work Fuse other types of context with the

context and optical context features.

Semantics In Digital Photos A Contenxtual Analysis

Technology