Post on 28-Dec-2015
transcript
Computational Video Group
From recognition in brain From recognition in brain to recognition in perceptual visionto recognition in perceptual vision systems.systems.
Case study: face in video.Case study: face in video.Example: identifying computer users with low-resolution webcams.Example: identifying computer users with low-resolution webcams.
Dmitry Gorodnichy and Gilles Bessens http://iit-iti.nrc-cnrc.gc.cahttp://synapse.vit.iit.nrc.ca (www.perceptual-vision.com)
CVR Conference on Computational Vision in Neural and Machine Systems, York University, Toronto, Ontario, Canada. June 15 - 18, 2005
What we want?
?
0
20
40
60
80
100
In
photos
In
video
By humans
By computers
Why to bother?
Face recognition systems performance (from NATO Biometrics workshop, Ottawa, Oct. 2004)
–Lots of $$$ already spent on face recognition to video data…– Still computers fail…– And still Face Recognition Grand Challenge (www.frvt.org) is seen: “in making the video data of better quality”… instead of …
developing approaches which can deal with low-quality data
Wrong approach - wrong resultsImage-based biometrics modalities
Photographic facial data and video-based facial data are two different modalities:
different nature of data different biometrics
different approaches different testing benchmarks
In video: faces are meant to be of low quality and resolution.Humans recognize faces on TV of 12 pixel between the eyes all the time
ICAO-conformed passport photograph(presently used for forensic identification)
Images from surveillance cameras (of 11/9 hijackers) and TV. NB: VCD is 320x240 pixels
Another application: Seeing computers
x y , z PVS
monitor
binary eventON
OFF
recognition /memorization
Unknown User!
Copyright S. A. LA NACION 2003. Todos los derechos reservados.
Rating by Planeta Digital (Aug. 2003)
Precision & convenience of tracking the convex-shape nose feature allows one to use nose as mouse (or joystick handle)
Motion,colour,edges,Haar-wavelets nose search box: x,y,width,height
Convex-shape template matching nose tip detection: I,J (pixel precision)
Integration over continuous intensity X,Y (sub-pixel pixel precision)
image
(X,Y)
Where we are now?
To understand how human brain does it
• 12 pixels between the eyes to be sufficient !
• Main three features of human vision recognition system:1) Efficient visual attention mechanisms
2) Accumulation of data in time
3) Efficient neuro-associative mechanisms
• Main Main three neuro-associative principles:three neuro-associative principles:
1.1. Non-linear processing Non-linear processing
2.2. Massively distributed collective decision makingMassively distributed collective decision making
3.3. Synaptic plasticitySynaptic plasticity
a) to accumulate learning data in time by adjusting synapse,
b) to associate a visual stimulus to a semantic meaning based on the computed synaptic values
Keys to resolving recognition problem
Lessons from biological vision
Saliency based localization and rectification - implemented
Recognition decision at time t depends on our recognition decision at time t+1
- implementedLocal brightness adjustment
- implemented
Accumulation over time and space - implemented
Lessons from biological memory
• Brain stores information using synapses connecting
the neurons.
• In brain: 1010 to 1013 interconnected neurons
• Neurons are either in rest or activated, depending on
values of other neurons Yj and the strength of
synaptic connections: Yi={+1,-1}
• Brain is a network of “binary” neurons evolving in time
from initial state (e.g. stimulus coming from retina)
until it reaches a stable state – attractor.
• Attractors are our memories!
Refs: Hebb’49, Little’74,’78, Willshaw’71
From visual image to saying name
From neuro-biological prospective, memorization and recognition are two stages of the associative process:
From receptor stimulus R to effector stimulus E
“Dmitry”
In brain
In computer
Main associative principle
Stimulus neuron Response neuron
Xi: {+1 or –1} Yj: {+1 or –1}
Synaptic strength:
-1 < Cij < +1
Main question of learning: How to update synaptic weights Cij as f(X,Y) ?
Learning processLearning rules: From biologically plausible to mathematically justifiable
• Hebb (correlation learning): is of form
• Better however is of form:
• Should be of form:
• Widrow-Hoff’s (delta) rule:
• We use Projection Learning rule:
It is most preferable, as it is:
- both incremental and takes into account relevance of training stimuli and attributes; - guaranteed to
converge (obtained from stability condition VVm =CV =CVm);
- fast in both memorization and recognition; also called pseudo-inverse rule: C=VV+
Refs: Amari’71,’77, Kohonen’72, Personnaz’85, Kanter-Sompolinsky’86,Gorodnichy‘95-’99
mj
mi
mij VV
NC
1
mij
mij
mij CCC 1
),( mj
mi
mij VVFC
),,( 1 mj
mi
mij
mij VVCFC
),( 1 mmmij VCFC
Steps of video-based recognition1. Face-looking regions are detected using rapid classifiers. 2. They are verified to have skin colour and not to be static. 3. Face rotation is detected and rotated, eye aligned and resampled to 12-pixels-between-the-eyes resolution face is extracted. 4. Extracted face is converted to a binary feature vector (Receptor): Yr 5. This vector is then appended by nametag vector (Effector): V= Y(0)=(Yr,Ye)6. In memorization: synapses of the network are updated: dCij(V) V In recognition: memory recall as attractor is achieved: Y(t*) Y(0)
12
24
2. .IOD
Recognition process
• Each frame initializes the system to state Y(0) = (01000011…, 0000)
from which associative recall is achieved as a result of convergence to an attractor Y(t*)= Y(t*+1) = (01000001…, 0010) – as in brain…
• Effector component of attractor (0010) is analyzed. Possible outcomes: S00 (none of nametag neurons fire), S10 (one fires) and S11 (several fire)
• Final decision is made over several frames:
(e.g. this is ID=5 in all these cases)
00001000000000010000000010000000001000000000100000
0000100000000000000000001000000000100000
0000100000001010000000001000000000100000
Tested!- Using TV programs annotation - Using IIT-NRC 160x120 facial video database
(one video to memorize, another to recognize)
Perceptual Vision Interface Nouse™• Evolved from a single demo program to a hands-free perceptual vision system which can recognize users.
• Uses a 160x120 low-fi webcam to constantly monitor the user’s identity
• Runs in background (a user may not even know he is being watched)
•Integrated with facial tracking
• Provides means for complete hands-free interaction
From our website: Try friv.exe yourself
- Works with your web-cam or .avi file
- Shows “brain model” synapses as watch (in memorization mode)
- Shows nametag neurons states as your watch a facial video (in recognition mode)
References• Gorodnichy, D. Video-based framework for face recognition in video. Second Workshop on
Face Processing in Video (FPiV'05) in Proceedings of Second Canadian Conference on Computer and Robot Vision (CRV'05), pp. 330-338. Victoria, BC, Canada. 9-11 May, 2005. NRC 48216.
• Gorodnichy, D. Associative neural networks as means for low-resolution video-based recognition. International Joint Conference on Neural Networks (IJCNN'05). Montreal, Quebec, Canada. July 31-August 4, 2005. NRC 48217.
• Gorodnichy, D. Projection Learning vs. Correlation Learning: From Pavlov Dogs to Face Recognition. In Correlation Learnings AI'05 Workshop. May 8, 2005. Victoria, B.C.. NRC 48209.
• Bessens, G., Gorodnichy, D. Towards Building User Seeing Computers. Second Canadian Conference on Computer and Robot Vision Workshop on Face Processing in Video (FPiV'05). May 9-11, 2005. Victoria, B.C. NRC 48210.
• Gorodnichy, D. Recognizing Faces in Video Requires Approaches Different from Those Developed for Face Recognition in Photographs, NATO IST - 044 Workshop on "Enhancing Information Systems Security through Biometrics". Ottawa, Ontario, Canada. October 18-20, 2004. NRC 47149.
• Dmitry O. Gorodnichy and Gerhard Roth. Nouse 'Use your nose as a mouse' perceptual vision technology for hands-free games and interfaces. Image and Vision Computing, Volume 22, Issue 12 , 1 October 2004, Pages 931-942, 2004. NRC 47140.
• D.O. Gorodnichy, A.M. Reznik. Increasing Attraction of Pseudo-Inverse Autoassociative Networks, Neural Processing Letters, volume 5, issue 2, pp. 123-127, 1997.