Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | magnus-carr |
View: | 215 times |
Download: | 0 times |
Face and Speech Identification System (FASIS)
George Liao, Andrew Au, Ching-Hsin Chen
Overview Project Overview Ztitch Solutions Team Motivation Design Solution Design Alternatives Software Design Hardware Design Finance Schedule Future Work What we learned Conclusion Acknowledgements / Questions Demo overview
2
Ztitch Solutions Team
3
Andrew Au (Team Leader): 5th year computer engineering student 16 months of development experience at Nokia & Sierra Wireless 4 months NSERC research assistant for Dr. Jie Liang Freelance mobile developer; published “Ztitch” app for Windows
Phone 7
George Liao: 5th year electronics engineering student Experience in MATLAB image processing Software debug and test Audio Processing
Ching-Hsin (Danny) Chen: 5th year electronics engineering student 12 months of research experience at Broadcom Hardware designer QA and debugging
Motivation
4
Number of smart phones worldwide ~200M [April 2010 Park Associates]
Mobile internet usage will exceed fixed line internet by 2014 [Morgan Stanley]
Steady growth in demand for mobile applications. Value market estimated ~$14.5B USD by 2012. [CNET]
Motivation
5
Despite high smart phone demand, there hasn’t been much innovation in the area of mobile log-in and security Username/password scheme is difficult on a phone Example: [email protected] / enter123
Any process/method which allows the user execute a task faster is highly desirable. Example: PayPal – fast payment system Google – efficient search engine SMS – fast messaging protocol
It’s all about fast and efficiency
Motivation (cont’d)
6
Our goal:
Implement a new method of secured mobile log-in
Eliminate the need for tedious typing on tiny touch screens or keypads
Secured, fast, and efficient
Design Solution
7
Face recognition Ease of access It’s quick to snap a photo But we need a secondary solution to make it more secured...
Voice recognition Providing a spoken phrase is also quick
Design Solution (cont’d)
8
We combine face and voice recognition as following Note: our original goal was to use mug shot to grant
access to the server, but there are still some concerns in our mind about the security issues. In alternative here is the steps that we have
1) User snaps picture of face using phone, which relays image to server via cellular internet connection
2) Server recognizes the face, and requests voice password
3) User speaks specific keyword as a password to phone, which relays the speech data to the server via VOIP
Design Solution (cont’d)
9
Processing will be done remotely on the server as an online service
Reason: More secured than client processing Independent of phone’s processing power Easier to apply software upgrades (updating a server vs.
updating thousands of users’ phones)
1
2 (send voice and face image)
3 (grant access)
Design solution (cont’d)
10
That was a simplified model
There are many other details to be considered, i.e. : Image compression Key encryption / decryption Reducing ambient noise during voice recognition Face localization Handling multiple failed attempts Image and voice data
For the proof-of-concept, we don’t have time to do all of this, only some of these plus the basic model
Design Solution (cont’d)
11
In our model, the face is the identifier, replacing username
The spoken-phrase replaces the password
“enter123”
Design Alternatives
12
Besides face and voice recognition, the other alternatives are:
1)Conventional typed username/password Slow and tedious as mentioned before
2) Fingerprint Requires hardware modification to existing phone Our system requires only software, but the demo
prototype has hardware modification for the purpose of 3rd party control of the phone during demo
3) Eye-Iris Complex and requires a special camera
Design Alternatives (cont’d)
13
Besides server-side processing, the alternative is client-side processing
Client-side processing is executing the face and voice recognition on the phone, rather than the server
Main disadvantage: Identifier & password are stored on phone and
therefore vulnerable to mobile thefts
Software Design
14
The software is divided into three parts:
1) Face Localization
2) Face Recognition
3) Voice Recognition
Software Design #1 Face Localization
15
The first step of the software is face localization, or tracking where it is
Where is my face in this image? The computer does not know!
Software Design #1 Face Localization
16
There are a few different methods of face localization, but some of them require additional equipment such as two cameras (stereo). Many research papers in this area.
Our method is simple and fast. Can be done in real-time.
First, we define the range of color that is the human skin color
Software Design #1 Face Localization
17
Second, we filter out all the color in the image that matches my definition of skin color
Filter
Software Design #1 Face Localization
18
Third, we remove the noises in the new image.
Noise removal
Software Design #1 Face Localization
19
Unfortunately, removing the noises also removes some data
So fourth, we expand with dilation
Expand
Software Design #1 Face Localization
20
Finally we have a face “blob”, and we can determine the center of this blob in x-y coordinates by stacking
Stack up the pixels for the two axis
Software Design #1 Face Localization
21
Now the problem of face localization is solved, and the computer knows where my face is
However, this is a simple case only...
Crop
Software Design #1 Face Localization
22
What if there are multiple faces in the image?
We can use the same steps as before except replace pixel stacking with Hough circle detection
Software Design #1 Face Localization
23
Same as before:
Software Design #1 Face Localization
24
An algorithm called Circular Hough Transform is used Detects the edge points that lie along the outline
of a circle We can generalize this method to
detect arbitrary shapes Slower than previous method, but covers more
scenarios
Software Design #2 Face Recognition
25
We choose to use a method of face recognition called Eigenface
Easy to implement and fits our tight development schedule
Can be upgraded to the Eigenfeatures for higher accuracy (as part of our future work)
Software Design #2 Face Recognition
26
First, add a set of images of the user’s face to the database
Usually 5 or more images with slight variations in angle and lighting conditions
We add our first image:
Compute mean face
Image #1
Mean face
Software Design #2 Face Recognition
27
Now, we add a second image to the database
Compute mean face
Image #1 Mean
faceImage #2
Software Design #2 Face Recognition
28
Now, we add a third image
Compute mean face
Image #1 Mean
faceImage #2 Image
#3
Software Design #2 Face Recognition
29
We can add a few more images until we finally have our database, a.k.a. training set
Now, we execute face recognition as follows: Compare the input image with the mean face, and
find the difference from face space, and the difference
If the error is above a certain threshold:recognition fails
If the error is below a certain threshold:recognition successful
Software Design #2 Face Recognition
30
Calculate two values: difference, and difference from face space
In this example Difference = 4418.3 Difference from face space = 316.4081
normalize((input-mean)-projection)
Mean face (from database)
Input image
Software Design #2 Face Recognition
31
We set our threshold values via trial and error. From our test:
When the input face image is the real owner Difference < 500 Difference from face space < 5000
When the input face image is NOT the real owner 500 < Difference < 2000 5000 < Difference from face space < 10000
When the input image is not a face 2000 < Difference 10000 < Difference from face space
Software Design #2 Face Recognition
32
Our results parallel the results from other Eigenface recognition researchers
The following is from cnx.org [Rice University]
Software Design #3 Voice Recognition
33
The brain of our voice recognition is the Microsoft Speech SDK which is free and comprehensive
Does not require the developer to have extensive knowledge in voice pattern science Provides a high level application programming
interface (API) for third party developers to use speech recognition in their applications
FASIS
Speech SDK 5.1
Hardware Design
34
We stress that the final commercialized product requires no hardware modification
We need to modify the hardware in the prototype to control the phone OS. We do not have the underlying permission to control the
phone’s functionality, such as sending an image automatically, or signalling it to lock/unlock
Hardware Design
35
In this prototype, the hardware are: The phone itself Hardware board to relay image to the server The PC acting as server Microcontroller to control the phoneSummary:
Transceiver
Microcontroller
Hardware Design
36
The phone: Nokia N96 Non-touch screen 320x240 resolution front facing camera
The transceiver: RS232 serial connection The board uses the MAX3222 IC
Low power consumption and high data rate Requires four 0.1μF external charge pump capacitors Guaranteed 120kbps while maintaining standard RS232
levels 2 receivers and 2 drivers
Hardware Design
37
The Server: Executes the software design Run using mainly MATLAB Also needs drivers to talk to transceiver and MCU Alerts owner of intruders via email (sends picture
attachment)
Hardware Design
38
The BOE kit Parallax Board of Education (BOE) kit Comes with MCU + bread board for our custom circuit to wire
the phone with the MCU USB connection for programming and communication during
run-time. We wired the phone’s buttons to the MCU so that we can control
those buttons using the PC
Hardware Design
39
Basic STAMP 2 Module: Processor Speed: 20 MHz RAM Size: 32 Bytes Number of I/O Pins: 16 + 2 dedicated serial PBASIC Commands: 42 Package: 24-pin DIP
Hardware Design
40
When switch is on, metal spring makes contact with two wires, allowing current to flow.
Phone’s internal MCU cycles the input lines B0 – B3: If you pull B1 High (and the others low) if some key is
pressed, the voltage is transferred to the corresponding row wire, so if you get A1 as output you know that the buttons SW1 is pressed.
Hardware Design
41
We can create a current to simulate the key press as follows: General-purpose I/O pins P0-P15: each can sink 25 mA
and source 20mA. The HIGH command sets the specified pin to 1 (a +5
volt level) and then sets its mode to output. HIGH 14
Hardware Design
42
Main:HIGH 0PAUSE 500LOW 0PAUSE 500END
Hardware Design
43
The integrated hardware:
Finance
44
The cost of this project was substantially reduced because Nokia provided us with the N96
Many software are also free for students via DreamSpark (Visual Studio 2010)
Total cost came to about $250 CAD
Finance
45
The overhead cost for commercializing this product is low because it is entirely software based
We can either accept a one-time fee, or an annual subscription fee from users
Most of the expense comes from hosting dedicated servers to execute the software algorithms and storing user’s training sets (face images)
There are many dedicated hosting services available for a monthly fee of ~$100 / month, allowing us to basically rent these expensive equipment located elsewhere
Schedule
46
Keynotes: Project began last semester Research took longest, then development Documentation cost a lot of time, but well worth it
Future Work
47
Improve the algorithms Eigenfeatures - combines facial metrics, which is
measuring distance between facial features, with the Eigenface approach
Further enhance localization methods
Collaborate with Symbian to get low level OS access Symbian is the Nokia phone’s operating system, and
FASIS needs permission from the company in order for FASIS to become a reality
Setup our dedicated servers This demo uses a laptop, but the final product requires
commercial grade servers to handle thousands of users
Future Work
48
Generalize the system for other brands, not just Nokia
What We Learned
49
Professional documentation (ENSC305)
Group dynamics and team management
How to create a product from scratch From research to commercialization
Programming Low-level (Microcontroller, C) High-level (SAPI, C#, .NET) Scripting (Batch files, MATLAB)
Conclusion
50
The Face and Speech Identification System (FASIS) fills the need for a rapid secured mobile log-in solution to eliminate tedious typing on small touchscreens/keypads
Efficient while maintaining a level of security
With further improvements, we firmly believe that FASIS could become a marketable product considering the current trend in the mobile industry...
Conclusion
51
There are 200 million smart phones in the world, and this number is rising rapidly...
...even if we capture only 1% of the market, our business can become huge
Acknowledgements
52
Ali & Carlyn Excellent feedback and comments in our marked
documents
Dr. Rawicz & Mike Excellent feedback during oral progress reports The idea for voice recognition
Nokia Vancouver Simon Wong, who provided us with the phone
Microsoft Free software tools for students via the DreamSpark
program
Questions
53
?
54
Thank you
-Ztitch Solutions
Live Demonstration
55
Overview:
1. Face localization
2. FASIS: Try to authenticate real owner (Andrew Au)
3. FASIS: Try to authenticate non-face object (hand)
4. FASIS: Try to authenticate an audience member