+ All Categories
Home > Documents > Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t...

Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t...

Date post: 24-Jan-2016
Category:
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
16
Visual Attention Jeremy Wyatt
Transcript
Page 1: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Visual Attention

Jeremy Wyatt

Page 2: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Where to look?

• Many visual processes are expensive

• Humans don’t process the whole visual field

• How do we decide what to process?

• How can we use insights about this to make machine vision more efficient?

Page 3: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Visual salience

• Salience ~ visual prominence

• Must be cheap to calculate

• Related to features that we collect from very early stages of visual processing

• Colour, orientation, intensity change and motion are all important indicators of salience

Page 4: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

On/Off cells• Recall centre surround cells

ON area

OFF area

OFF area

ON area

Light spotTime

LightON Cell OFF Cell

Page 5: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Colour sensitive On/Off cells

• Recall that some ganglion ON cells are sensitive to the outputs of cones

ON

OFF

Page 6: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

An intensity change map

• I = (r+g+b)/3 gives I, the intensity map• The intensity change map is formed from a grid of

on/off cells (they overlap)• There are several maps, each from cells with

receptive fields at a different scale• Each cell fires for its area

Page 7: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

How do we calculate the maps?

• We can create each on cell using a pair of Gaussians

- =ON area

OFF area

Light spot

Page 8: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

How do we calculate the maps?

• Imagine grids of fat and thin Gaussians

• We calculate the value of each Gaussian in each grid and then subtract one grid (here with 16 elements) from the other

• This implements our grid of on cells

Page 9: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Calculating the intensity change map• We do this for a mix of scales

• We have to interpolate the values of some maps to match the outputs of others (this corresponds to cells that have overlapping receptive fields)

• By aligning and then combining the maps at different scales we have implemented a grid of on cells, or a grid of off cells

Page 10: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Other maps

• We can now do this for red, green, yellow and blue

• We also do this for intensity changes of a certain orientation

- gives

Page 11: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Combining maps to calculate saliency• We now add the maps to obtain the saliency of each group of pixels

in the scene

Saliency map

• We normalise each map to the same range before adding• We weight each map before combining it• We attend to the most active point in the saliency map

Page 12: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Attending to areas of the scene

• We use the salience model I have described to attend to certain areas of the scene

• We can now use this salience model to make other visual processes more efficient (e.g. object recognition)

Page 13: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Learning names and appearances of objects

Page 14: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Salience can be modulated by language

Page 15: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Modulating visual salience by language:results

SIFT based recognition

0

0.5

1

1.5

2

2.5

SpriteCan

Diet CokeCan

Coke Can Magic LucozadeBottle

Object

Tim

e (s

ecs) Full Scene

Bottom up salience

Modulated by context

Number of Fixations

Package Full Scene

Bottom up salience

Modulated by Context

Sprite

Can

1 4.5 1

Diet Coke

Can

1 7 3.1

Coke can 1 3.5 1

Magic 1 2 2

Lucozade

Bottle

1 1 2

Fanta bottle

1 11.9 11.7

Page 16: Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.

Summary

• Visual attention is guided by many features

• A good model of attention involves parts of early visual processing we have already seen

• We can use this to make object learning in robots more efficient


Recommended