© 2009 Robert Hecht-Nielsen. All rights reserved. 1 Andrew Smith University of California, San...

© 2009 Robert Hecht-Nielsen. All rights reserved.

1

Andrew SmithUniversity of California, San Diego

10.14.09

Building a Visual Hierarchy


2

Outline

Building A Visual Hierarchy

Learning layer-by-layer

Inference – filling in a missing segment of an image

Examples\

Applications/Products & Future work


3

Choosing an appropriate problem

We want to:

Model human visual processes.

Understand vision in terms of Confabulation Theory.

Build practical applications.

Begin basis for much deeper research.

Answer:

Build image modeling system.

Represent images in terms of textural components (low statistical order).

Represent images as symbolic (discrete) tuples.


4Machine Vision vs. Biological Vision

Machine Vision

Pixels --- local representation.

Orthogonal

Biological Vision

Filter/Feature responses

Massively overcomplete/non-orthogonal


5Confabulation & vision(Pixels → Modules & Symbols)

Features (symbols) develop in a layer of the hierarchy as commonly seen inputs from their inputs.

Knowledge links are simple conditional probabilities between symbols:

p(|) where and are symbols in connected modules

All knowledge can therefore be learned by simple co-occurrence counting.

p(|) = C(,) / C()

Confabulation operations:

Given evidence, find the answer that maximizes:

p(|) p(|) p(|) p(|)


6

Building a vision hierarchy

• Can no longer use SSE to evaluate model

[ SSE maximizes p(|,,) ]

• Instead, make use of generative model:

– Always be able to generate a plausible image.


7

Data set

• 4,300 1.5 Mpix natural images (BW)


8

Vision Hierarchy – level “0”

We know the first transformation from neuroscience research: simple cells approximate Gabor filters.

5 scales, 16 orientations (odd + even)

Parameters picked to closely resemble feline simple cells.

Same approach is used elsewhere in lab. [Minnett, et al.]


9


• Does the full convolution preserve information in images? (inverted by LS)

• Very closely.


10


• We can do even better by super-sampling an image before encoding:


11


• Supersampling RMSE:

1x: 0.0202 2x: 0.0081 3x: 0.0051 4x: 0.0044 5x: 0.0038


12

Inverting Gabor Representations

Studied by Daugman

Simple cells (found in 1950s) re-represent “pixel” data, were first characterized by Daugman as Gabor Logons in 1980's.

Attempted to answer “How much information is lost?”

“not much!” -- Able to completely reconstruct images.

(i.e. what we've just seen in previous few slides)

Frame Analysis can show:

Can mathematically prove when complete inversion is possible.

Optimal linear inverse.


13

Vision Hierarchy – level 1

• We now have a simple-cell like representation.

• How to create a symbolic representation (“Complex Cells”)?

• Apply principle of Confabulation Theory: Collect common sets of inputs from simple cells: similar to a Vector Quantizer.

• Keep the 5-scales separate

– (quantize 16-dimensions, not 80)


14


• To create actual symbols, we use a vector quantizer

– Trade-offs (threshold of quantizer) :

Number of symbols Preservation of information

Probability accuracy

• Solution Use angular distance metric (dot-product)– Keep only symbols that occurred in training set more than

200 times, to get accurate p().

– After training, ~95% of samples should be within threshold of at least one symbol.

– Pick a threshold so images can be plausibly generated.


16


• Symbolic representation can generate plausible images:

• A theory of animal vision that actually demonstrates that animals can see!


17


• ~8,000 symbols are learned for each of the 5 scales.

• Complex local features develop. (unlike PCA re-representations & ICA representations)


18


• Now image is re-represented as 5 “planes” of symbols:


19

Knowledge links:

• Learn which symbols may be next to which symbols (conditional probabilities)

• Learn which symbols may be over/under which symbols.

• Go out to ‘radius’ 7.

Consistent with cortical representation of knowledge

Very large (10s of GB) set of knowledge.


20

Texture modeling – (inference)

What if a portion of our image symbol representation is damaged?

Blind spot

CCD defect

brain lesion

We can use confabulation (generation) to infer a plausible replacement.


21

Texture modeling – Inference 1

• Fill in missing region by confabulating from lateral & different scale neighbors (rad 5).


22

Texture modeling


24

Texture modeling


25

More Examples 1/7 (find the replacements)


26

More Examples 1/7 (replacement locations)


27



28



29



30



31



32



33



34



35



36



37

Texture modeling

Conclusions

This visual hierarchy does an excellent job at capturing an image up to a certain order of complexity.

Given this visual hierarchy and its learned knowledge links, missing regions could plausibly be filled in. This could be a reasonable explanation for what animals do.

Preparing for publication (IEEE Transactions on Image Processing), with help from Professor Serge Belongie (CSE).

Last hurdle to graduation!


44

The next level…

Level 2 symbol hierarchy

• Collect commonly recurring regions of level 1 symbols.

• Symbols at Level 2 will fit together like puzzle pieces.

Thank you!

Date post:	21-Jan-2016
Category:	Documents
Upload:	bonnie-hall
View:	215 times
Download:	0 times

© 2009 Robert Hecht-Nielsen. All rights reserved. 1 Andrew Smith University of California, San...

Documents