80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763...

Post on 13-Jan-2016

221 views 0 download

Tags:

transcript

80 million tiny images: a large dataset for non-parametric object and scene

recognition

CS 4763 Multimedia Systems

Spring 2008

Motivation

There are billions of images available online, which is a dense sampling of the visual world. Can we use them effectively?

Existing datasets have 102 --104 images spreading over a few different classes.

Problems needed to be concerned

How big is enough to robustly perform recognition?

What is the smallest resolution with reliable performance in classification?

Low dimensional image representation

32 × 32 color images contain enough information for scene recognition, object detection and segmentation.

Low dimensional image representation (Cont.)

Scene recognition

Low dimensional image representation (Cont.)

Segmentation of 32 × 32 images

Low dimensional image representation (Cont.)

We cannot recognize the below objects without the knowledge about their context.

Low dimensional image representation (Cont.)

Conclusion for low resolution representation:

32 × 32 color image contains enough information for scene recognition, object detection and segmentation.

Low dimensional image representation (Cont.)

Conclusion for low resolution representation:

It is practical to work with millions of images with a small resolution in respect of image storage capacity, image processing in retrieval process.

Example:256 × 256 × 3 = 192 KB / image

It takes 192 GB for 1 million images.

32 × 32 × 3 = 3KB / image

It takes 3 GB for 1 million images.

A large dataset of 32 × 32 images (Cont.)

Collection procedure [Russell et al. 2008]Where -- internet, collecting images from 7 independent image search engines.

What -- result images from search engines by querying non-abstract nouns.

How --

A large dataset of 32 × 32 images (Cont.)

Statistics of tiny image in database

Statistics of very low resolution images (Cont.)

Statistics of very low resolution images (Cont.)

Impact on performance:

logarithmical

similarity metrics:Dshift

Experiments – person detection

Person detectionContaining person or not

Existing Detection:Face detection, head and shoulders, profile faces

Experiments (Cont.) – person detection

Experiments (Cont.) -- Person localization

Similarity

Measure:

Dshift

Nearest

Neighbor

Number: 80

Experiments – Scene recognition

Scene recognitionRetrieving the images with semantic meaning of “location”

Experiments (Cont.) – Scene recognition

High voting for “location”

Low voting for “location”

Conclusion

Their experiments show that 32 × 32 is the minimum color image resolution for a reliable object recognition and scene recognition.

The 79 million dataset can provide a reasonable density over the manifold of natural images.

With the huge dataset and semantic voting scheme, it performs well in person detection, person localization and scene recognition.

References

1. B. C. Russell, A. Torralba, K. Murphy, W. T. Freeman. LabelMe: a database and web-based tool for image annotation. Intl. J. Computer Vision, 77(1-3):157-173,2008

2. C. Fellbaum. Wordnet: An Electronic Lexical Database. Bradford Books, 1998