A SYNTAX FOR IMAGE UNDERSTANDINGWhy Segments as Candidate Objects Photometric Segments useful...

transcript

A SYNTAX FOR IMAGE UNDERSTANDING

Narendra Ahuja

University of Illinois at Urbana-Champaign

May 21, 2009

Work Done with .

Sinisa Todorovic, Mark Tabb, Himanshu Arora, Varsha .

Hedau, Bernard Ghanem, Tim Cheng .

The Question

What is a good low-level image representation

to enable

Object Recognition,

Reasoning,

Synthesis, ... ?

What is an Object?

Object = Layout of parts with some Intrinsic Properties

e.g., Wall = Layout of Doors, Windows …

Each Part is itself a (simpler) Object Object = Hierarchy

e.g., Building WallWindows...

Object Complexity = Complexity of parts/hierarchy and layout

e.g., Building comprised of bricks

What is Not an Object

A crowded city street

A serene landscape

Allowed but not for today, for focus on more immediate issues

The Scene Object

Scene = Layout of Objects

= Hierarchical Layout

From Scene to Image

Imaging Preserves Localization

Image = Hierarchical Layout of Regions

Image vs. Objects

Subimages of Parts

Smallest Subimages

(= Smallest parts)

Simpler Objects

Primitive Objects

Recognition and Segmentation

Do not have access to windows with only the object of

interest

For model acquisition as well as subsequent recognition

Need to consider Simultaneous Segmentation and

Modeling/Recognition

Combinatorial Problem

Where is Which Object?

Too many possible subimages

To be matched with object models

Circular problem

Reduce combinatorial complexity,

by reducing object/image size

Parts are Simpler to Represent/Model

Smaller images/objects are

likely to be easier to handle

Number of matching Object Models is

likely to be Smaller

Primitive Objects have

the Smallest Number of Candidate Models

Object Representation is Recursive

Object

Arrangement of Parts

Characterized by three types of Properties

Photometric Geometric Topological

Each Part is sufficiently simple, or is an

Breaking the Loop

Identify Candidate Subimages

A Hierarchical Partitioning of an Image

A Multiscale, Low-Level Image Segmentation

Segments = Objects of different complexity

Why Segments as Candidate Objects

Photometric Segments useful estimates of objects

Because

Object Boundary

Almost Always = Photometric Boundary

Although Photometric Boundary

May or May not = Object Boundary

Because

Independent Objects

Independent shape, orientation, reflectance

Segment/Object Contour

The Argument of Dimensionality

Segment dimensionality = 2D

= Our object dimensionality

Segment information capacity matched with object

Lower dimensional representations

Point features

Edge fragments

Although 3D still missing

Extensibility

Due to more complete correspondence to parts

Segments

• Simplify analysis/reduce dependence on tools

• Offer greater promise for moving beyond

the basic tasks of today

e.g., to more complex objects,

more abstract objects,

context sensitivity...

Representation Issues vs. Analysis Details

Will focus more on the representation issues

and skip

Detailed tools to carry out the various tasks

e.g. tools for: Probabilistic analysis

Structural analysis

Image Representation

Image Homogeneous regions at

ALL contrasts and sizes

Multiscale

Segmentation

Extract Hierarchical Layout of Regions

Region = Largest Homogeneous Set of Contiguous Pixels

Ahuja PAMI96, Tabb&Ahuja TIP97, Arora&Ahuja ICPR 06

Example Segmentations for Several Contrasts

in Photometric Hierarchy

Image Representation = Segmentation Tree

Multiscale Segmentation Segmentation Tree (of embedded regions)

Image Objects and Image Segmentation Tree

• Images Э Multiple Independent Objects

• Image Tree Э Multiple independent Subtrees

• Each Object = One or More Subtrees

• Object Modeling = Capturing Object Subtrees

• Photometric: Intensity contrast and variance

• Geometric:

• Area, variance of children areas

• 1st central moment, eccentricity

• Squared perimeter over area

• Topological:

• Angle between child and parent’s principal axes

• Displacement of child centroids

• Context vector: spatial distribution of sibling regions

• Todorovic&Ahuja PAMI07, IJCV07

Examples of Properties

Modeling and Recognition = Subtree Matching

Discovery = Matching across image sets (frequency)

Modeling = Finding canonical tree of an object

Category

Occurrences

Sets of Matching

Object Category Model = Stochastic Tree Structure

region properties

number of children

Object part (hidden)

Exponential Gaussian

Markovian chainstructure + parameters

Each Node and Branch Probabilistically Determined

Model = Grammar

Object Subtree Model

Tree of Probability Density Functions

Stochastic Grammar

From Model to Simultaneous Recognition and Segmentation

Inference = Matching image tree against the learned tree model

Results: Weizman Horses

training

images

category model

Results: Weizmann Horses

• Object segmentation is good on contours that are:

• Jagged

• Blurred

• Form complex patterns

• Low-contrast regions merge with background

Recall and Precision

Real World

> 30,000 categories

Too Many Categories

• 30000 independent models is not a good idea

Because world is not full of unrelated things

a. Parts are shared among objects

b. In different configurations in different objects

c. So category representations interrelated

d. This is directly reflected in apparent organization of Human

Knowledge/Semantics

Any similar 2D objects?

Arbitrary Images

Category = Set of Similar 2D ObjectsCategories Found

Scaling Up Category Representation

• Categories = Configurations of Shared Subcategories

• Subcategories are simpler and smaller

• Robust detection

• Sharing = Sublinear complexity Minimal computation

unshared

object parts

Multi-Category Representation = Taxonomy

• Interleaved Trees of

• Probability Density Functions of

• Tree Structures, and Tree Node

Properties

UIUC Hoofed Animals Dataset: Contains Six Animals

Simultaneous Recognition and Segmentation

Results: AnimalsSimultaneous Detection, Recognition, SegmentationSimultaneous Recognition and Segmentation

Taxonomy Structure

Observed Category Statistics

Not All Subcategories are Equally Informative

• So far

• P (Detection) = P (Match Quality)

• = Decision Making Based on Likelihood

• Uniform Priors on

• P (Subcat)

• P (Cat| Subcat)

But Discovered Unshared Provide More Evidence

If legs, then many possibilities

If antlers, then very likely deer

If lake, then very unlikely desert

Unshared Categories Uniqueness

Need Bayesian Detection

• During Training on Representative Datasets

• Estimate P(Cat)

• Estimate P(Cat| Subcat)

• Todorovic&AhujaCVPR08

Results: Caltech-101 and Caltech-256

Caltech-101

Caltech-256

Bringing In Layout

So Far Pure Hierarchy

Image = Segmentation TREE of Regions

Object = Subtree Actually {Subtrees}

= Recursive Embedding of Regions

Taxonomy = Interleaved STs

All Characterized by Probabilities

Problems with Pure Hierarchy

No Explicit Layout Information

Object Model = No Neighbor Relationships Among Parts

Undesirable Consequence:

Recognition Insensitive to Spatial Scrambling of Parts !

Solution = Connected Segmentation Tree (CST)

Add Links between Neighbor Nodes

Implementation = Links between Siblings

Result: Connected Segmentation Tree (CST)

= Hierarchy of Neighbor Graphs

Ahuja&Todorovic, CVPR’08

CST Based Taxonomy

Each Category = CST Subtree

(Actually {SubCSTs})

Taxonomy = Interleaved CSTs

= Interleave Hierarchies of Neighbors Graphs

Training Images Discovered CST Category Model

Results: Weizmann Horses

ST vs. CST

Degree of occlusion

artificially made in the image

Binary strength of

neighbor relationships

Real-valued strength of

neighbor relationships

ST vs. CST

Input Images Segmentation Tree CST

UIUC Hoofed Animals

LabelMe

CSTs outperform STs

Especially for partial occlusion, or

When only region layout is used without containment

Vs. Language

Embedding = Hierarchy (and Legolike compatibility)

Neighbors = Juxtaposition

Occlusion = Only Subtrees of Object tree visible

Inter-object Interaction/combinatorics friendlier

Ordering/Multiple Counting addressed by structure

Instability of Segmentation Addressable

• Splitting and merging of adjacent regions

• Partial Matching

Hedau&AhujaCVPR08, Cheng and Ahuja

Syntax should Feed Multiple Semantics

A Representation Should Work

for Multiple Applications

Modeling 2.1D Texture

• Physical texels are characterized by

• Texel thickness << Texel distance

• Inter-texel occlusion

• Only a part of a texel may be visible

• Visible texel parts = Samples of

different, unknown texel parts

Learning Texel Model

union + PDF2.1D texture identified subimages registration

Ahuja and Todorovic, ICCV07

Texel Extraction Results

Another Example – Texture Segmentation

Texture Segmentation

Another Example: Texel Distribution

How are texels distributed across texture

Ghanem and Ahuja, Submitted

Summary

• Syntax = Connected Segmentation Tree

• Semantics = Recognition, Synthesis, ...

• Model = Stochastic Grammar

• Inference = Grammar Based Parsing/Recognition (Not

covered)

• Tools = Structural and Statistical Analysis (Not covered)

A SYNTAX FOR IMAGE UNDERSTANDINGWhy Segments as Candidate Objects Photometric Segments useful...

Documents