Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | claude-brooks |
View: | 221 times |
Download: | 0 times |
80 million tiny images: a large dataset for non-parametric object and scene
recognition
CS 4763 Multimedia Systems
Spring 2008
Motivation
There are billions of images available online, which is a dense sampling of the visual world. Can we use them effectively?
Existing datasets have 102 --104 images spreading over a few different classes.
Problems needed to be concerned
How big is enough to robustly perform recognition?
What is the smallest resolution with reliable performance in classification?
Low dimensional image representation
32 × 32 color images contain enough information for scene recognition, object detection and segmentation.
Low dimensional image representation (Cont.)
Scene recognition
Low dimensional image representation (Cont.)
Segmentation of 32 × 32 images
Low dimensional image representation (Cont.)
We cannot recognize the below objects without the knowledge about their context.
Low dimensional image representation (Cont.)
Conclusion for low resolution representation:
32 × 32 color image contains enough information for scene recognition, object detection and segmentation.
Low dimensional image representation (Cont.)
Conclusion for low resolution representation:
It is practical to work with millions of images with a small resolution in respect of image storage capacity, image processing in retrieval process.
Example:256 × 256 × 3 = 192 KB / image
It takes 192 GB for 1 million images.
32 × 32 × 3 = 3KB / image
It takes 3 GB for 1 million images.
A large dataset of 32 × 32 images (Cont.)
Collection procedure [Russell et al. 2008]Where -- internet, collecting images from 7 independent image search engines.
What -- result images from search engines by querying non-abstract nouns.
How --
A large dataset of 32 × 32 images (Cont.)
Statistics of tiny image in database
Statistics of very low resolution images (Cont.)
Statistics of very low resolution images (Cont.)
Impact on performance:
logarithmical
similarity metrics:Dshift
Experiments – person detection
Person detectionContaining person or not
Existing Detection:Face detection, head and shoulders, profile faces
Experiments (Cont.) – person detection
Experiments (Cont.) -- Person localization
Similarity
Measure:
Dshift
Nearest
Neighbor
Number: 80
Experiments – Scene recognition
Scene recognitionRetrieving the images with semantic meaning of “location”
Experiments (Cont.) – Scene recognition
High voting for “location”
Low voting for “location”
Conclusion
Their experiments show that 32 × 32 is the minimum color image resolution for a reliable object recognition and scene recognition.
The 79 million dataset can provide a reasonable density over the manifold of natural images.
With the huge dataset and semantic voting scheme, it performs well in person detection, person localization and scene recognition.
References
1. B. C. Russell, A. Torralba, K. Murphy, W. T. Freeman. LabelMe: a database and web-based tool for image annotation. Intl. J. Computer Vision, 77(1-3):157-173,2008
2. C. Fellbaum. Wordnet: An Electronic Lexical Database. Bradford Books, 1998