6 Object Recognition - FAUbressler/EDU/CogNeuro/topic6.pdf6. Pattern V is fed back from F2 to F1...

Cognitive Neuroscience Object Recognition

Connectionist models of cognition To understand how the brain achieves object recognition is a difficult problem that is currently under investigation by researchers in cognitive neuroscience. The basic problem of object recognition is to determine the nature of the isomorphic relation between the connectional hierarchy in the sensory brain and the hierarchical representation of perceptual processing that leads to object recognition. Because the experimental evidence in non-conclusive, competing theories about how object recognition occurs in the brain each have support. To better understand the problem, computational neural models of cognition have been proposed. Artificial neural networks (ANNs) have been trained to perform object recognition. In psychology, ANNs are called connectionist models.

Connectionist models of cognition are ANNs that are useful for studying object recognition. Common features of connectionist models:

1) they assume the distribution of knowledge in assemblies of units, neurons, or nodes that represent the component elements of knowledge.

2) the nodes are interconnected in networks by synapses. 3) the networks are layered. 4) some connections between layers are reciprocal, supporting reentrant

processing. 5) layers are connected by parallel, convergent, and divergent connections. 6) networks learn by modification of synaptic weights 7) they learn by unsupervised learning: synapses are strengthened (weights

increase) by temporal coincidence of pre- and post-synaptic activity

Object recognition as interaction of sensation and memory in cortex Object recognition implies that learning of an object has previously occurred. Therefore, the sensory analysis used in perception is thought to be guided by perceptual knowledge stored as perceptual network memories. 1. Memories are represented as patterns of synaptic modification in perceptual

cortical networks that span levels within sensory hierarchies and across different hierarchies (i.e. are heterarchical).

2. Sensory events are integrated into the structure of pre-existing perceptual memory.

3. Integration of sensory events with pre-existing perceptual knowledge occurs at higher levels than primary sensory cortex.

4. Integration of sensation in more than one modality involves transmodal association cortex.

5. Integration of sensory events with action requires prefrontal cortex. 6. High-level unimodal and transmodal networks in association cortex represent

abstract unimodal and transmodal concepts; they are connected to lower-level networks with which they are associated.

The difference between passive and active perception For object recognition to be guided implies that it is an active, not a passive, process. But this conclusion is not universally accepted. For the case of vision, we next consider the competing views of perception as being either passive or active. The difference between passive and active perception may be expressed in computational terms: passive perception is strictly feed-forward (top figure) whereas active perception involves feed-back (bottom figure) from higher “down-stream” stages to lower “up-stream” stages.

The argument for passive visual perception Hypothesis:

The visual stimulus operates like a stamp or imprint. Reflected light patterns impinging on the retina determine retinal activity; these light patterns are faithfully reproduced in retina, LGN, & V1.

Implication for perception: All the information needed for visual perception is presented to the retina from the external world. Visual perception only requires progressive processing of visual information, and so depends only on feedforward processes.

Proposed neural mechanism: Low-level visual features are detected in V1, and progressively more elaborate features are detected in higher visual areas. Visual recognition occurs when high-level symbolic features are compared with features stored in memory.

Supporting evidence: Anatomical: the bottom-up projection from retina to V1 and higher visual areas is retinotopic, suggesting faithful reproduction of light patterns. Electrophysiological: single-neuron recording in visual cortex shows that V1 neurons are sensitive to low-level visual features & higher-level visual neurons are sensitive to more complex visual features. Lesion analysis: lesions along the path from retina to V1 produce blindness for the part of the visual field corresponding to the lesioned cells; lesions in visual areas outside V1 produce "mind-blindness" -- lack of visual comprehension.

The argument for active visual perception Hypothesis:

The visual sensorium is constantly changing, and real-world visual stimuli are ambiguous and indeterminate. Not all the information needed for visual perception is presented to the retina from the external world, and retinal activity does not specify the light pattern to be perceived.

Implication for perception: Visual perception requires information supplied by the brain, and so depends on feed-back (top-down) as well as feed-forward (bottom-up) processes.

Proposed neural mechanism: Low-level activity patterns in V1 undergo progressive elaboration through an iterative feedforward-feedback cycle involving higher visual areas. Visual recognition occurs when high-level visual areas produce patterns from memory representing hypotheses that are consistent with low-level activity patterns.

Supporting evidence: Physical: the flux of photons at the retina is highly variable in time. Anatomical: top-down inputs to V1 are more prevalent than bottom-up inputs. Perceptual: visual perception can occur without visual stimulation, as in imagery.

ART The Adaptive Resonance Theory (ART) is a view of how active perception may be accomplished. Grossberg & Carpenter developed ART beginning in the 1980s. ART has spawned a class of computer model that captures some of the cardinal features of active perception. ART models are trained to perform object recognition, i.e. after learning, they recognize a category of sensory patterns as coming from a common object.

The ART Architecture In the ART model, sensory input is delivered to a “feature representation field” (F1), which interacts with a “category representation field” (F2).

The ART search cycle: 1. An input pattern I registers itself as a short-term memory activity pattern x in

field F1 (frame a) 2. Pattern x is transformed into a compressed pattern T in field F2 (frame a) 3. Competition occurs among nodes in F2 for the strongest match with T (frame a) 4. Activation of memory trace y occurs at the node in F2 having the strongest

match (frame a)

5. Memory trace y is treated as a hypothesis to be tested by matching its top-down

pattern V against pattern x that selected it (frame b) 6. Pattern V is fed back from F2 to F1 where it is matched against pattern x (frame

b)

7. Those portions of pattern x that match top-down pattern V are suppressed. The portion of x that is not suppressed is the residual activity pattern x* (frame b)

8. Residual activity pattern x* represents a pattern of critical features in the current input x that are different from what was hypothesized as y (frame b)

9. Steps 2-8 are repeated iteratively: residual pattern x* is fed forward from F1 to F2, where another memory trace emerges through competition and feeds back a new pattern V* to F1 (frame b)

10. Scenario 1 (an adequate match is found): after a number of iterations, V converges to x and x* gets smaller and smaller; if x* becomes sufficiently small, the final y represents an adequately matched memory pattern which is the percept of I (frame b)

11. Scenario 2 (no adequate match is found): if the input is too novel to satisfy the matching criterion, V does not converge to x and x* remains large. F2 is reset (frame c), and a new memory trace y* is created corresponding to x* (frame d)

Application of the concept of active perception to the cortex 1. We infer that perception by the brain involves a similar iterative matching between a set of sensory impressions and pre-established memory networks:

a) If an adequate match occurs, the matching network becomes the percept b) If no adequate match occurs, a new network is created and becomes the

percept 2. Perceptual categorization of sensory information does not require consciousness, and we are not normally aware of the different processes, executed in parallel, that underlie perception. 3. Perceptual processing is usually guided by selective attention. Attentional perception is conscious and is executed sequentially. However, this does not mean that we are conscious of all the steps involved in perceptual processing. Perhaps, we are only aware of the results. Attention may be viewed as an aid to categorical perception à it often leads to formation of a new category or to re-categorization. Without selective attention, sensory systems would be either overwhelmed or oblivious to important sensory details. In other words, the capacity of these systems for processing sensory information is limited.

4. Mood can affect perceptual categorization by influencing attention. E.g. depression often dulls perception, & produces anhedonia (the absence of pleasure from experience that would normally be pleasurable) and negative mis-categorizations (e.g. hypochondria). Emotional connotations can affect perceptual categorization regardless of whether we are conscious of them or not.

The Gestalt Perception is seen to consist in the classification of sensory items by the binding of features according to Gestalt grouping principles. Examples of Gestalt grouping principles:

a) common motion Motion example: http://www.biomotionlab.ca/Demos/BMLwalker.html

b) spatial contiguity (proximity)

c) similarity & continuation

d) temporal contiguity (proximity) Temporal contiguity occurs when two or more stimuli are experienced close together in time and, as a result an association is formed.

Tone sequences, or “streams”

If the audio and video get even slightly out of sync, the viewer may hear an actor’s words before or after his lips move, which is easily detected and fairly annoying. The strength of binding is affected by factors such as:

a) repetition b) emotional or motivational connotation c) motor contingency d) sensory-sensory association

Sensory information arriving at the cortex is organized according to sets of spatial and temporal relations between elementary sensory features. These relations define the informational structure that become stored in memory through perception.

The Gestalt school of psychology studied the structured patterns of visual images. They sought to explain how we are able to identify regularities in the sensory world, e.g. visual objects. The term “gestalt” has come to mean a pattern of elements unified as a whole with properties that cannot simply be derived from the parts. Basic questions addressed by Gestalt psychology:

a) how do we perceive objects as individual entities? b) how do we segregate objects from others around them? c) how do we segregate objects from the background? d) how is the identity of objects preserved despite discontinuities, distortions, or

occlusions?

Figure-Ground Perception Visual perception involves the recognition of objects (figure) as distinct from their backgrounds (ground). Objects appear to “stand out” from the background. Figure-ground perception in vision usually depends on edge assignment and how that effects shape perception. It may be bistable, meaning that either of two (stable) figures may be perceived. This may occur when a visual pattern is too ambiguous for the visual system to recognize it with a single unique interpretation. The differentiation between foreground and background of a sensory scene also exists for perception in the other sensory modalities, such as hearing and touch.

Gestalt principles of organization (Fuster, Figure 4.2): a) proximity b) similarity c) continuation d) closure

Gestalt principles apply at the psychophysical level, i.e. they are valid for describing perceptual phenomena, but not necessarily the underlying neural phenomena. Past attempts by Gestalt psychologists to theorize about cortical phenomena were largely unsuccessful. The generalization of Gestalt principles to cognitive function will be useful to define cognitive structure in terms of relationships among components. Such a generalization could be applied to:

a) other sensory modalities than vision b) higher levels of abstraction c) temporal as well as spatial organization

Showing that Gestalt principles apply not only to perceptual phenomena, but also to perceptual networks, will help to establish an isomorphism between the structure of cognition and the structure of cortical networks.

Cortical dynamics of perception The network perspective views the perceptual network as central to perception. The perceptual network represents in its spatial structure specific relations in the sensory environment. This view postulates that perception is the sensory activation of a perceptual network in the posterior cortex. Object recognition is seen as the categorization of sensory information according to the memory structure that has been built up by prior experience. Memory structure is viewed as a spatial pattern of linked perceptual memory networks. A. Principles of categorical perception by network activation

1) The network perspective views perception as consisting of the categorization of sensory input according to pre-existing perceptual memory.

2) Perception takes place within a hierarchically organized system of cortical perceptual memory networks (called perceptual networks for short).

3) Established perceptual networks both guide perception and are modified by it. 4) Sensory activity has spatial structure that is similar to the structure of existing

perceptual networks. 5) Perceptual networks are activated by matching the spatial structure of sensory

activity to that of perceptual networks; activation of the perceptual network depends on the similarity of those structures.

6) Perceptual network activation is degenerate – a given network may be activated by a variety of sensory activity patterns that are similar to its spatial structure.

B. Putative steps in categorical perception by perceptual network activation 1) A sensory activity pattern may activate multiple perceptual networks that share

common features. 2) The perceptual networks that are activated the most represents the categorical

perception (recognition) of the input pattern. 3) Sensory patterns that represent familiar objects activate perceptual networks at

higher levels of abstraction, up to the semantic or symbolic level, eventually allowing identification.

4) Sensory activity patterns that represent unfamiliar objects undergo more elaborate analysis, involving iterative matching between higher & lower level perceptual networks, eventually leading to creation of a new perceptual network at a higher level.

5) Multiple sensory activity patterns may be analyzed in this way in parallel.

C. Convergence and divergence 1) Upward convergence in perceptual hierarchies promotes the association of

higher-level perceptual networks with multiple lower-level perceptual networks. 2) Upward divergence promotes the association of a lower-level perceptual

network with multiple higher-level perceptual networks. D. Cortical areas contributing to perception

1) Primary sensory cortex: involved in constructing sensory activity patterns based on low-level feature detection

2) Unimodal sensory cortex: involved in categorization of sensory activity patterns 3) Transmodal association cortex

a. limbic & paralimbic cortex: evaluation of emotional associations of percepts (in cooperation with amygdala)

b. lateral frontal cortex: evaluation of motor associations of percepts, e.g. affordance (in Gibson’s theory of ecological perception, an affordance is a perceived possibility for action)

c. multimodal convergence cortex: evaluation of poly-sensory associations of percepts

E. Symbols 1) A symbol may be viewed as an abstract, perceptual category that is represented

by a high-level perceptual network having profuse connectivity with lower-level perceptual networks.

2) Symbols, including words and other linguistic structures, are encoded in networks of the posterior (unimodal & transmodal) association cortex.

3) Symbolic networks may consist of nodes of convergence. Damasio proposed that convergence zones (nodes) are high-level networks in association cortex that represent association patterns, i.e. they code for specific patterns of sensory activities. A symbol is a pattern of association of sensory activities. When the symbolic network is activated, it may re-activate the lower-level sensory networks from which it was formed.

4) A symbolic perceptual network is highly degenerate in that it may be activated by many different lower-level perceptual networks. It may be amodal, or may be associated with only one or more modalities. The symbol “airplane” may be considered bimodal – it is defined by visual and/or auditory features, but not by taste.

5) Symbolic networks are distributed across regions of higher association cortex, including inferior, medial, and superior temporal cortex, Wernicke’s area, and posterior parietal cortex.

F. Perceptual constancy 1) Perceptual constancy is the ability to recognize the same perceptual entity when

the sensory components vary, e.g. to recognize a visual object at different distances and viewing angles.

2) It is observed in visual neurophysiological recording as the similarity of single-unit firing to the same object under different viewing conditions.

3) The degree of perceptual constancy evidenced by neurophysiological recording increases with ascending levels of the perceptual hierarchy.

4) The degree of perceptual constancy in an area may be related to the degree of abstraction of categorical representation. E.g., neurons showing constancy for visual objects are found in inferior temporal cortex. Neuronal populations showing constancy for graphic symbols such as letters & words are found in Wernicke’s area.

Perceptual binding Perceptual binding refers to the joining together of the associated sensory features of a perceptual object into a gestalt. A perceptual network for the Gestalt represents the perceptual object. The object is categorized, and perception of the object occurs, when a perceptual network is activated. A neural binding mechanism is needed to join together the neural activity in different parts of the perceptual network.

What is the neural mechanism of neural binding? A. Evidence from neuroelectric activity Binding has been proposed to operate by the phase synchronization of oscillatory activity from columnar modular assemblies. Assembly oscillations are best observed in the LFP, EEG, and MEG. Oscillations are classified by frequency: delta: 0-3 Hz theta: 4-7 Hz alpha: 8-12 Hz beta: 13-30 Hz gamma: 31-100 Hz Beta & gamma activity together are also called High-Frequency (HF) activity. When the waves of oscillation in different assemblies are aligned in time, they are called phase-synchronized. Assemblies do not communicate by LFP waves. They send pulse activity back and forth along axonal pathways. However, LFP phase-synchronization may be a sign of functional binding of assemblies. In perception, the binding together of distributed assemblies in a perceptual network may occur by phase synchronization of HF oscillatory activity.

The study by Tallon-Baudry et al (1995) supports this idea: when subjects are presented with Kanizsa triangle images, the level of EEG HF activity at 300 ms poststimulus over the occipital lobes appears to correspond to the degree of perceived spatial coherence of the triangle (Fig. 4.4).

Although human EEG power (amplitude squared) does not directly reveal neural binding, it does track the (fast) dynamics of visual perception in the visual cortex. It indirectly reflects neural binding because it changes in proportion to HF phase synchronization in the cortex underlying the EEG electrode. Thus, the Tallon-Baudry et al (1995) results support the idea that perception of the triangle’s spatial coherence depends on HF neural binding in visual cortex.

B. Evidence from functional neuroimaging PET & fMRI (neuroimaging) studies are useful for detecting the activation of perceptual networks. Because of their slow time resolution, they do not directly image active networks. Rather, they show the “ghosts” of heavily activated nodes (epicenters) of excitatory neuronal activity in the networks. These are the most heavily activated nodes of distributed memory networks. In neuroimaging studies of visual perception, for example, the epicenters of perceptual networks for object categories appear in unimodal visual and multimodal association areas. The evidence from meta-analysis of neuroimaging studies is compatible with the idea that objects are represented at several hierarchical levels, from sensory to symbolic. For example, the meta-analysis of neuroimaging studies of color perception shows activation maxima in inferior occipital cortex. Meta-analysis of neuroimaging studies of color word presentation and color imagery have maxima further anterior in inferior temporal cortex.

Date post:	09-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

6 Object Recognition - FAUbressler/EDU/CogNeuro/topic6.pdf6. Pattern V is fed back from F2 to F1...

Documents