Computational Video Group From recognition in brain to recognition in perceptual vision systems....

transcript

Computational Video Group

From recognition in brain From recognition in brain to recognition in perceptual visionto recognition in perceptual vision systems.systems.

Case study: face in video.Case study: face in video.Example: identifying computer users with low-resolution webcams.Example: identifying computer users with low-resolution webcams.

Dmitry Gorodnichy and Gilles Bessens http://iit-iti.nrc-cnrc.gc.cahttp://synapse.vit.iit.nrc.ca (www.perceptual-vision.com)

CVR Conference on Computational Vision in Neural and Machine Systems, York University, Toronto, Ontario, Canada. June 15 - 18, 2005

What we want?

photos

By humans

By computers

Why to bother?

Face recognition systems performance (from NATO Biometrics workshop, Ottawa, Oct. 2004)

–Lots of $$$ already spent on face recognition to video data…– Still computers fail…– And still Face Recognition Grand Challenge (www.frvt.org) is seen: “in making the video data of better quality”… instead of …

developing approaches which can deal with low-quality data

Wrong approach - wrong resultsImage-based biometrics modalities

Photographic facial data and video-based facial data are two different modalities:

different nature of data different biometrics

different approaches different testing benchmarks

In video: faces are meant to be of low quality and resolution.Humans recognize faces on TV of 12 pixel between the eyes all the time

ICAO-conformed passport photograph(presently used for forensic identification)

Images from surveillance cameras (of 11/9 hijackers) and TV. NB: VCD is 320x240 pixels

Another application: Seeing computers

x y , z PVS

monitor

binary eventON

recognition /memorization

Unknown User!

Rating by Planeta Digital (Aug. 2003)

Precision & convenience of tracking the convex-shape nose feature allows one to use nose as mouse (or joystick handle)

Motion,colour,edges,Haar-wavelets nose search box: x,y,width,height

Convex-shape template matching nose tip detection: I,J (pixel precision)

Integration over continuous intensity X,Y (sub-pixel pixel precision)

Where we are now?

To understand how human brain does it

• 12 pixels between the eyes to be sufficient !

• Main three features of human vision recognition system:1) Efficient visual attention mechanisms

2) Accumulation of data in time

3) Efficient neuro-associative mechanisms

• Main Main three neuro-associative principles:three neuro-associative principles:

1.1. Non-linear processing Non-linear processing

2.2. Massively distributed collective decision makingMassively distributed collective decision making

3.3. Synaptic plasticitySynaptic plasticity

a) to accumulate learning data in time by adjusting synapse,

b) to associate a visual stimulus to a semantic meaning based on the computed synaptic values

Keys to resolving recognition problem

Lessons from biological vision

Saliency based localization and rectification - implemented

Recognition decision at time t depends on our recognition decision at time t+1

- implementedLocal brightness adjustment

- implemented

Accumulation over time and space - implemented

Lessons from biological memory

• Brain stores information using synapses connecting

the neurons.

• In brain: 1010 to 1013 interconnected neurons

• Neurons are either in rest or activated, depending on

values of other neurons Yj and the strength of

synaptic connections: Yi={+1,-1}

• Brain is a network of “binary” neurons evolving in time

from initial state (e.g. stimulus coming from retina)

until it reaches a stable state – attractor.

• Attractors are our memories!

Refs: Hebb’49, Little’74,’78, Willshaw’71

From visual image to saying name

From neuro-biological prospective, memorization and recognition are two stages of the associative process:

From receptor stimulus R to effector stimulus E

“Dmitry”

In brain

In computer

Main associative principle

Stimulus neuron Response neuron

Xi: {+1 or –1} Yj: {+1 or –1}

Synaptic strength:

-1 < Cij < +1

Main question of learning: How to update synaptic weights Cij as f(X,Y) ?

Learning processLearning rules: From biologically plausible to mathematically justifiable

• Hebb (correlation learning): is of form

• Better however is of form:

• Should be of form:

• Widrow-Hoff’s (delta) rule:

• We use Projection Learning rule:

It is most preferable, as it is:

- both incremental and takes into account relevance of training stimuli and attributes; - guaranteed to

converge (obtained from stability condition VVm =CV =CVm);

- fast in both memorization and recognition; also called pseudo-inverse rule: C=VV+

Refs: Amari’71,’77, Kohonen’72, Personnaz’85, Kanter-Sompolinsky’86,Gorodnichy‘95-’99

mij VV

mij CCC 1

),( mj

mij VVFC

),,( 1 mj

mij VVCFC

),( 1 mmmij VCFC

Steps of video-based recognition1. Face-looking regions are detected using rapid classifiers. 2. They are verified to have skin colour and not to be static. 3. Face rotation is detected and rotated, eye aligned and resampled to 12-pixels-between-the-eyes resolution face is extracted. 4. Extracted face is converted to a binary feature vector (Receptor): Yr 5. This vector is then appended by nametag vector (Effector): V= Y(0)=(Yr,Ye)6. In memorization: synapses of the network are updated: dCij(V) V In recognition: memory recall as attractor is achieved: Y(t*) Y(0)

2. .IOD

Recognition process

• Each frame initializes the system to state Y(0) = (01000011…, 0000)

from which associative recall is achieved as a result of convergence to an attractor Y(t*)= Y(t*+1) = (01000001…, 0010) – as in brain…

• Effector component of attractor (0010) is analyzed. Possible outcomes: S00 (none of nametag neurons fire), S10 (one fires) and S11 (several fire)

• Final decision is made over several frames:

(e.g. this is ID=5 in all these cases)

00001000000000010000000010000000001000000000100000

0000100000000000000000001000000000100000

0000100000001010000000001000000000100000

Tested!- Using TV programs annotation - Using IIT-NRC 160x120 facial video database

(one video to memorize, another to recognize)

Perceptual Vision Interface Nouse™• Evolved from a single demo program to a hands-free perceptual vision system which can recognize users.

• Uses a 160x120 low-fi webcam to constantly monitor the user’s identity

• Runs in background (a user may not even know he is being watched)

•Integrated with facial tracking

• Provides means for complete hands-free interaction

From our website: Try friv.exe yourself

- Works with your web-cam or .avi file

- Shows “brain model” synapses as watch (in memorization mode)

- Shows nametag neurons states as your watch a facial video (in recognition mode)

References• Gorodnichy, D. Video-based framework for face recognition in video. Second Workshop on

Face Processing in Video (FPiV'05) in Proceedings of Second Canadian Conference on Computer and Robot Vision (CRV'05), pp. 330-338. Victoria, BC, Canada. 9-11 May, 2005. NRC 48216.

• Gorodnichy, D. Associative neural networks as means for low-resolution video-based recognition. International Joint Conference on Neural Networks (IJCNN'05). Montreal, Quebec, Canada. July 31-August 4, 2005. NRC 48217.

• Gorodnichy, D. Projection Learning vs. Correlation Learning: From Pavlov Dogs to Face Recognition. In Correlation Learnings AI'05 Workshop. May 8, 2005. Victoria, B.C.. NRC 48209.

• Bessens, G., Gorodnichy, D. Towards Building User Seeing Computers. Second Canadian Conference on Computer and Robot Vision Workshop on Face Processing in Video (FPiV'05). May 9-11, 2005. Victoria, B.C. NRC 48210.

• Gorodnichy, D. Recognizing Faces in Video Requires Approaches Different from Those Developed for Face Recognition in Photographs, NATO IST - 044 Workshop on "Enhancing Information Systems Security through Biometrics". Ottawa, Ontario, Canada. October 18-20, 2004. NRC 47149.

• Dmitry O. Gorodnichy and Gerhard Roth. Nouse 'Use your nose as a mouse' perceptual vision technology for hands-free games and interfaces. Image and Vision Computing, Volume 22, Issue 12 , 1 October 2004, Pages 931-942, 2004. NRC 47140.

• D.O. Gorodnichy, A.M. Reznik. Increasing Attraction of Pseudo-Inverse Autoassociative Networks, Neural Processing Letters, volume 5, issue 2, pp. 123-127, 1997.

Computational Video Group From recognition in brain to recognition in perceptual vision systems....

Documents