+ All Categories
Home > Documents > Visual At t ention

Visual At t ention

Date post: 08-Jan-2016
Category:
Upload: jalena
View: 41 times
Download: 0 times
Share this document with a friend
Description:
Visual At t ention. Derek Hoiem March 14, 2007 Misc Reading Group. The Eye. 120 million rods (intensity) 7 million cones (color) Fovea: 2 degrees of cones. Saccades and Fixations. Scope: 2 deg (poor spatial res beyond this) Duration: 50-500 ms (mean 250 ms) - PowerPoint PPT Presentation
Popular Tags:

of 47

Click here to load reader

Transcript
  • Visual AttentionDerek HoiemMarch 14, 2007

    Misc Reading Group

  • The Eye120 million rods (intensity)7 million cones (color)Fovea: 2 degrees of cones

  • Saccades and Fixations

    Scope: 2 deg (poor spatial res beyond this)

    Duration: 50-500 ms (mean 250 ms)

    Length: 0.5 to 50 degrees (mean 4 to 12)

    Various types (e.g., regular, tracking, micro)

  • Saccades and Fixations Free ExamineWhat are the material circumstances of the family?What are their ages?What were they doing before arrival?Remember the clothesRemember object and person positionsHow long has the unexpected visitor been away?[Yarbus 1967]

  • Visual PhenomenaFast scene recognition (100-150 ms)Fast contains animal (100-150 ms)Pop-outAttentional blindnessChange blindness

  • Pop-out (texture)+

  • Pop-out (more texture)+

  • Pop-out (harder texture)+

  • Pop-out (color)+

  • Pop-out (color + texture)+

  • Pop-out (layout)+TTTTTTT

    T TTTTT

    TTTTTTT

    TTTTTTT

    TTTTTTT

    TTTTTTTT

  • Pop-out Performance vs. Distractors

  • Demos

    http://viscog.beckman.uiuc.edu/djs_lab/demos.html

  • Model of VisionPre-Attentive Stage[Rensink 2000] (figure from Itti 2002)

  • Purpose of attention

    Warning (animals, flashes, sudden motion)

    Exploration (find objects, verification)

    Inspection

  • Bottom-up Attention Models[Itti Koch Niebur 1998]Gabor Pyramid + Orientation FiltersSubtract low-res (3-4 octaves) from higher resAverage MapsInhibition + Excitation

  • Bottom-Up: Normalization

    Normalize map values to fixed range [0..1]Compute average local maximum mMultiply map by (1-m)2

    [Itti Koch Niebur 1998]

  • Bottom-Up: Predicted Fixations[Itti Koch Niebur 1998]

  • Updates to Bottom-Up Model

    Cross-orientation suppression

    Long-range contour interactions

    Eccentricity-dependent processing (e-x)Goal: better prediction of subsequent fixations

    [Peters Iyer Itti Koch 2005]

  • Experiments with Newer Model[Peters Iyer Itti Koch 2005]

  • Experiments with Newer Model[Peters Iyer Itti Koch 2005]Normalized Scanpath SalienceInter-observer Salience

  • Almost No Benefit to More Complicated Models[Peters Iyer Itti Koch 2005]

  • Eccentricity-Dependent Filtering Helps[Peters Iyer Itti Koch 2005]No EDFEDF

  • Other Bottom-Up Issues

    Real viewing vs. images (Gajewski et al. 2005)Longer saccades (12 deg vs. 4 deg)Short saccades may be due to density of images, rather than movement cost

    Saliency map?Evidence for multi-saliency representations (not clear there is a single map)Capability to ignore predictable motions is difficult in map formulation

    ImageSaliency Map

  • Alternative Bottom-up Models

    Itti-Koch accounts for some pop-out effectsIs it biologically plausible? (peak normalization)Is it biologically reasonable? No reasonable mechanism for next fixation

    Top-down bias only (Wolfes guided search)Does not account for free viewing behavior

    Surprise or explanation seeking (expectation-based saliency)No saliency map requiredMay provide better prediction of next fixation, account for motion prediction

  • Top-Down Attention Models

  • Top-Down Attention Models

    Feature weightingVerbalVisual

    Location priorFrom memory of scene (direct or indirect)From scene information and semantics

  • Verbal Cueing Feature WeightingFaster search if cued as to color or texture

    Faster yet if exemplar is shown

    Searching for mid-level cues (e.g., intensity, size, saturation) is harderBut may still be cued[Navalpakkam Itti 2006]

  • Verbal Cueing Feature Weighting[Navalpakkam Itti 2006]

  • Verbal Cueing Feature Weighting[Navalpakkam Itti 2006]

  • Verbal Cueing Feature Weighting[Navalpakkam Itti 2006]

  • Role of MemoryPeople can remember hundreds or thousands of scenes from single exposure (Shephard 1967)

    After seeing repeated scenes (in random order)Faster finding of target (Brockmole and Henderson 2006)

    When mirrored after learningFirst look at original location, then quickly go to new location (still faster) Learning of upside-down scenes takes twice as long

  • Role of Memory[Brockmole and Henderson 2006]

  • Scene ContextScene-constrained targets detected faster, with fewer eye movements

    Strategy1st: check target-consistent regions2nd: check target-inconsistent regions[Neider Zelinsky 2005]

  • Scene Context[Neider Zelinsky 2005]Target PresenceTarget Absence

  • Scene ContextGist can provide image height prior[Torralba et al. 2006]Saliency = inverse probability ^(0.05) * gaussian

  • Scene Context[Torralba et al. 2006]Gist

  • Scene Context[Torralba et al. 2006]

  • Scene Context[Torralba et al. 2006]

  • Scene Context[Torralba et al. 2006]

  • Scene Context[Torralba et al. 2006]

  • Scene Context: People Search[Torralba et al. 2006]

  • Scene Context: Object Search[Torralba et al. 2006]

  • Bottom-up + Top-down Attention

    Method 1: Weight individual features

    Method 2: Saliency .* Bias

  • Conclusions

    Artificial static scenes and pop-out well-explained by existing models

    Little recent progress in bottom-up models (stuck with Itti-Koch model)

    Only simplistic scene information modeled

  • SourcesSaliencyItti, Koch, Niebur (1998). A model of saliency-based visual attention for rapid scene analysis.Itti, Koch (2001). Computational Modelling of Visual Attention.Itti (2002). Modeling Primate Visual Attention.Itti (2002). Visual Attention.Navalpakkam, Arbib, Itti (2004). Attention and Scene Understanding.Peters, Iyer, Itti, Koch (2005). Components of bottom-up gaze allocation in natural images.Role of memoryChun, Jiang (1998). Contextual cueing: implicit learning and memory of visual context guides spatial attention.Chun, Jiang (2003). Implicit, long-term spatial contextual memory. Brockmole, Henderson (2006). Recognition and attention guidance during contextual cueing in real-world scenes: Evidence from eye movements.Top-Down AttentionNiedur, Zelinksy (2005). Scene context guides eye movements during visual search.Navalpakkam, Itti (2006). Top-down attention selection is fine grained.Torralba, Oliva, Castelhano, Henderson (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features on object search.

  • SourcesOthers (used)Rensink, ORegan, Clark (1997). To see or not to see: the need for attention to perceive changes in scenes.Liversedge, Findlay (2000). Saccadic eye movements and cognition.Rensink (2000). The dynamic representation of scenes.Delorme, Rousselet, Mace, Fabre-Thorpe (2004). Interaction of top-down and bottom-up processing in the fast visual analysis of natural scenes.Gajewski, Pearson, Mack, Bartlett, Henderson (2005). Human gaze control in real world search.http://www.diku.dk/~panic/eyegaze/node13.html

    Others (not used but potentially interesting)Itti, Koch, Braun (2000). Revisiting spatial vision: toward a unifying model.Epstein (2005). The cortical basis of visual scene processing.Baldi, Itti (2005). Attention: Bits versus Wows.


Recommended