Cognitive Vision
Markus VinczeAutomation Control Institute
Vienna University of [email protected]
www.acin.tuwien.ac.at
PSFMR – Fermo, 11.-16.9.2006
Idea of Today• Overview of Cognitive Vision Methodology
• Scratch at cognitive science and cognitive systems
• Open your view to other disciplines
• Point out many open problems that are simply awaiting a good student for resolution
Content• Overview
• Tracking
• Detection
• Cognitive Vision– Vision systems– Integration
– Computer Vision Cognitive Vision ...
– Cognitive Systems
• Ambient Intelligence
• “Natural” computer interfaces (e.g., MIT, MS)
• Japan: developmental (humanoid) robotics
Ideas, Drives, Future(s)
STARTREKOdyssey 2001
MIT icom MS EasyLiving
IPA
Personal Assistance• Support user by being aware of situation
• Distributed mobile and ambient devices
Example situations:
• Information assistance, guidanceto location, assembly help
• Alerting of dangerous situations
Personal Assistant• User guidance to
– operate a machine (e.g., copy machine, video/CD-player)
– assemble objects (e.g., furniture, machine maintenance)
• Exploit Augmented Reality to display information
• On-line interpretation to aid user
Personal Assistance – Ingredients
Capabilities• Detection, tracking,
recognition, spatio-temporal reasoning, ...
• Interpret human intentions before acting
• Personalised behaviour
Austrian CV Project: understandand react to situations
Robot Helper g„James, please bring me my cup“
Capabilities• Navigate, avoid obstacles• Detect & recognise objects• Grasp objects• Interact with user• Cope with new situations• Dependable and safe behaviour
Vision for Interaction
Capabilities• Robust detection, tracking• Object and gesture
recognition• Spatio-temporal object
relationships (3D)• Interpretation, understanding
ActIPret: interpretation of humans who handle objects
Cognitive Vision ComponentsEU project ActIPret
Object recognition(CMP)
Robust object
detection
Stereo hand tracking (FORTH)
Object tracking
Cognitive Vision ComponentsEU project ActIPret
Hand gesture recognition (COGS)
Spatio-temporal object reasoning (in 3D)
'Hand 0 picked up object cd-linux-0'Semantic interpretation
GUI – Graphical User Interface
On-line display of 3D results(trajectories, recognition and
interpretation results, )
'Hand 0 pressed button ejectButton-2''Hand 0 picked up object cd-linux-0'
Stereo obser-vation
Off-line VR replay of activity
EU project ActIPret
MOVEMENT• Movment of
– Persons, objects, information
• Stereo vision for navigation– Segement floor
– Obstacle detection
– Detection of tables and chairs
person
Infor-mationobject
MOVEMENT – EU IST Projekt 2004-2007
Vision Capabilities• Vision can provide many capabilities
• Vision itself has many capabilities (redundancies)– Temporal redundancy
– Stereo, many views
– Many cues per image
– Vast number of features
– Multiple representations
• Integrate with other system functions
Summary• Many, many vision (perception) capabilities
• Capabilities operate in context
A consequence:
• To solve even a simple task ⇒ system
Another consequence:• More than vision – e.g., cognitive vision
• Integration – architecture and tool
Content• Overview
• Tracking
• Detection
• Cognitive Vision– Vision systems
– Integration– Computer Vision Cognitive Vision ...
– Cognitive Systems
Integration: Vision is not Alone• Other sensors
– odometry, distance, touch
– Time-of-flight, ultrasound, infrared, ...
– acoustic, olfactory, ...
• „Envisioned“ embodiment
• Task, situation
• Knowledge representations, common sense
• Semantics, language
System Requirements• Task and data-oriented
• Context-based
• Reactive control
• Enable distribution
• Separated development
• Modular + scaleable
• Reusability (!)
Some Options• Dataflow: pipes & filters [Unix shell]
• Layer architectures [OS, ISO-OSI]
• Object-oriented [Corba, ...]
• Event-driven [HMI]
• Shared data: blackboard [DBs]
• Agent-based [AI]
• Component based [Software engineering]
Integration based on Components• Components encapsulate functions
– Service principle - „Yellow pages“– Dynamic linking
• Reusable, distributed, scalable
• Simple (installation, programming)
• „Fast“
Capabilities of Each Component• Function
• Communication
• Memory
• Self-evaluation (reports confidence, accuracy, resource demands)
• Control (processing, view)
• Context (exploit it, report it)
ComponentComponent
ComponentComponent
ComponentComponent
ComponentComponent
ComponentComponent
Control& Data
Data
Re-active Dynamic Integration
ComponentComponent
Component
Component
Component
Component
Component
•• Avoiding negotiationAvoiding negotiation
Example Architecture for ActIPret
Task-related Space of Interest
Zwork (Zillich‘s network)• RPC (Remote Procedure Calls)• Asynchronous• Automatic marshalling of messages• Simple debugging (gdb/ddd)• Logging• GUI Component
Service ProviderService ProviderService Provider
Service Requester
managingframe
Implemen-tation;
providerinterface
requesterinterfaceComponent structure.
• Every lab has another approach:– Definition and notation of functions, skills (and context)
• Key: practicability + ease-of-use
• „Standard“ interface definitions– Get more specific along project
• Integration learn to build systems– How do parts work together?
– Learn which parts work together.
Conclusion Integration
Content• Overview
• Tracking
• Detection
• Cognitive Vision– Vision systems
– Integration
– Computer Vision Cognitive Vision ...– Cognitive Systems
Computer Vision• Computer Vision is a subfield of AI concerned with
processing of images from the real world.
• Purpose: program a computer to "understand" a scene or features in an image.
• Methods: detection, segmentation, tracking, pose estimation, mapping to 3D model, recognition of objects in images (e.g., human faces, robot navigation)
• Achieved by means of pattern recognition, statistical learning, projective geometry, image processing, graph theory and other fields.
Pattern Recognition• "the act of taking in raw data and taking an
action based on the category of the data" [1]
• Goal: detect and learn known patterns
• Methods: statistics, machine learning, ...
[1] Richard O. Duda, Peter E. Hart, David G. Stork (2001) Pattern classification (2nd edition), Wiley, New York.
Computer Vision – A SummaryMany solutions and many more problems, e.g.:
1. Real world?
2. Brittle, thresholds.
3. „Understand“ scene?
4. „Understand“ features?
5. Segment ⇔ recognise?
6. Replication of experiments?
7. Formal description of capabilities?
Computer Vision – Lessons Learned
Serious work with real world images.
Robust, threshold-free methods.
To „understand“ act in scene.
Huge set of features.
Segment AND recognise.
PETS, DBs.
... .
Problems: start to work on the core problems, e.g.:
1. Real world?
2. Brittle, thresholds.
3. „Understand“ scene?
4. „Understand“ features?
5. Segment ⇔ recognise?
6. Replication of experiments?
7. Formal description of capabilities?
Machine Vision• = application of computer vision to factory
automation.
• A MV system is a computer that makes decisions based on the analysis of digital images.
LightLighting Object Sensor
Image
DataResultsControl
Media Reflection
Processing
Figure: Components of a machine vision system.
Machine Vision• Problem 1: narrow applications
• Problem 2: understanding?
• Lesson learned: more options to control
• Lesson learned: consider complete system
LightLighting Object Sensor
Image
DataResultsControl
Media Reflection
Processing
Figure: Components of a machine vision system.
LightLighting Object Sensor
Image
DataResultsControl
Media Reflection
Processing
Cognitive Vision
Cognitive Vision
• embodiment (AmI, PDA, AR, VR, robotics)
representations •
computer vision •
• system architecture
• machine learning
• user experiments, usability
neuro science •information theory •
• cognitive science
• artificial intelligence
systems engineering •
Cognitive science Computer Vision
Cognitive Vision
Cognitive Science• = scientific study either of mind or of
intelligence, it is inherently interdisciplinary – E.g., psychology, neuroscience, linguistics,
philosophy, computer science, and biology
• Cognition = „coming to know“ = act of acquiring knowledge
• be aware of and judge the result of this act
• "cognitive" - any kind of mental operation or structure that can be studied in precise terms[Lakoff, Johnson, 1999]
Cognitive (Computer) Vision One definition:
CV is the act of seeing to obtain empirical factual knowledge– Act: some form of body, self-awareness,
communication, evolving
– Seeing: all computer and biological vision has to offer
– Empirical: based on observation and experiment
– Factual: objective reality, repeatable
– Knowledge: facts acquired, models, procedures
Cognitive (Computer) VisionAnother definition of Cognitive Vision
• 4 levels of generic computer vision functionality– Detection, localization, recognition, and understanding
– Purposive goal-directed behaviour
– Adapting to unforeseen changes of environment
– Anticipate the occurrence of objects or events.
• Achieves capabilities through – Learning semantic knowledge (i.e. contextualized
understanding of form, function, and behaviour)
– Knowledge about environment, itself, and relationsips
ECVision (2002 - 2005), www.ecvision.org
Why Cognitive Vision?• Old terms did not succeed, new term might
• Interdisciplinarity is something good
• „New“ understanding:– Active (since 1987)
– Learning, evolving
– Embodied (envisioned)
• 1000x more computing power in last 15 years
• Understanding cognitive science
Cognitive Vision - Essentials• Seeing: eyes, head, vision data processing
• See what? Objects, humans, environment– „Come to know“ about them
– Only knowledge relevant to seeing
• Act of seeing upon – what is visible or
– becomes visible before it becomes invisible again
For Example: Hide & Seek
Kind.mov
• Key: combining top-down (cognitive) with bottom-up (vision) processes
Cognitive Vision - Challenges• Object permanence (hiding but existing)
• Spatial and dimensional awareness (close or far range, spatial relationships, stacking objects)
• Temporal awareness (synchronous events, e.g., pointing)
• Hierarchical object concepts
• Detect something new
• Awareness of camera/body (view point reasoning, self-localisation)
Cognitive Vision System (CVS)• Instantiation of the bits and pieces necessary
for cognitive vision– System = interacting group of items forming a
unified whole [Merriam Webster]
• Is it only the „seeing“ part of a system?
• Action? How much action?
• Body? How much body?
CVS as part of a Cognitive System
Cognitive Vision System• Vision under egomotion (e.g., + inertial sensors)
• Interaction with other (vision) systems
• Hand-eye coordination (throw objects, eye-body coordination)
• Interpreting gestures
• Search for sounds with eyes (+ auditory cues)
• Viewing the world as seen from a third person’s perspective (= “perspective taking”)
More Challenges• Representations: objects, relationships,
situations, context, „visual“ semantics
• Understanding: function and use of objects
• System: support multiple tasks & autonomy
• Real world: work with it and use it (context)
Possible Methodological Approach• Cognitive reference: people
• Learn from system evaluation by people
• Build system close to people, i.e., in their environment
• Learn from, not copy, biological vision systems
Content• Overview
• Tracking
• Detection
• Cognitive Vision– Vision systems
– Integration
– Computer Vision Cognitive Vision ...
– Cognitive Systems and examples
• Foresight Cognitive System Project, UK– natural or artificial information processing systems
– perception, learning, reasoning and decision making
– communication and action
Cognitive Systems• FP6, EU
– physically instantiated (embodied) systems
– perceive, understand (semantics) and interact
– evolve in order to achieve human-like performance in activities requiring context specific knowledge.
Interdisciplinarity of CognitiveSystems and Cognitive Science• E.g., psychology, neuroscience, linguistics,
philosophy, computer science, biology, …
• Human as reference for cognitive capabilties
• Several examples– Navigation
– Vision
• Source of inspiration
Path Integration in Insects
[Mallot]
Path Complexity does not Impair Visual Path Integration
• Path segments in VR
• No effect on number of segments
• Directionand distance encoding?
[Wiener]
Geographical Slant as Compass• Ground plane
slanted 4 degrees– Perceived visually
and via force feedback
• Pointing the right way becomes easier
[Restat, Mallot]
Orientation in Children• Children re-orient by the shape of the room
• Sensitive to surface layout: distance, angle, sense
• Do not user landmarks for orientation
• Landmarks are detected and remembered
[Gouteux, Spelke, 2001]
Orientation in Adults• Landmarks are used and described verbally
• Not used with verbal interference
• With interference, adults become like rats.
• Orientation on surfaces: children, rats, fish
[Ratliff, Necombe, 2005]
SLAM?• Do humans SLAM?
• Orientation based on main structure
• Icon-based navigation– Plus obstacle avoidance
• Knowledge about what to expect– On airport, train station, ...
Cognition as Control• Hierarchy to cope with complexity
[Hollnagel]
Neuroscience• Study of the human nervous system, brain, and
biological basis of consciousness, perception, memory, and learning
• Brain has a triad structure – reptilian brain controls basic sensory motor functions
– mammalian brain: emotions, memory, biorhythms
– neocortex or thinking brain that controls cognition, reasoning, language, and higher intelligence
• Continued reconnecting and learning – Learn from real experiences, integrated "whole" ideas
Cortex – Examples• About 30 regions involved in vision, half of
cortex
• MT: detecting areas of motion in images (0.1 s after motion is in image)
• V1: cells respond to oriented edges
• V1: BUT 85% of axions come not from retina
• Hippocampus – place cells, direction cells
• Cognitive map [Tolman 1948]
Illusions• Study human vision
system
• Experience of eye: world is benign– Counter example: Gorilla
• Computer Vision suffers from serial processing– Human: all cues in parallel
– Subsequent fight for what is most plausible
Object Perception• Spatio-temporal constraints to form objects
(4-month olds)
[Spelke: Principles of Object Perception, 1990]
Human Vision Learns• Child: perceived
as one object
• First:motion, surfaces (see before)
• Later: shape, appearance
[Spelke]
Gestalt Laws• Develop in humans
• Occlusion: completion depends on experience
• Criteria– Good continuation
– Similarity
• Animals?
[Spelke]
Chicks• Perception of occluded objects without
experience of occlusion
• Inborn object completion
[Regolin]
Number• Multiple Object Tracking
– A) Following several moving targets
– B) Connections disrupt expectations
– C) Too many to follow
• Set size limit: 3-4– Children, adults, animals
• Perception– Cohesion, contact, continuity
– Auditory set size limit: 3-4[Scholl, Wynn, Mittroff, van Marle]
A)
B)
C)
Sensitivity to Geometry• Response to geometrical relationsships
[Dehaene, Izard, Pica, Spelke, 2006]
Sensitivity to Geometry – Results • Strikingly similar patterns
– Munduruku live in Amazonas region
– Adults in Boston improve over children
• Relationships represent Euclidian geometry
Final Conclusion• Human is the only working vision and
cognitive system
• Cognitive Science and related fields throw some light on how it may work inspiration
• Cognitive science tries to put it all together
Experience, recommendation:• Design methods without parameters
• Work with system, not individual components