+ All Categories
Home > Documents > Voice Recognition Lawrence Pan Syen Hassan Jamme Tan.

Voice Recognition Lawrence Pan Syen Hassan Jamme Tan.

Date post: 21-Dec-2015
Category:
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
21
Voice Recognition Lawrence Pan Syen Hassan Jamme Tan
Transcript

Voice Recognition

Lawrence Pan

Syen Hassan

Jamme Tan

Overview

History of voice recognition Why voice recognition? Technology behind voice recognition

Five major steps Common applications Current leaders

Demonstrations Product Evaluation

Implementation of our own voice recognition system Grade retrieval system for EE3414

Future Challenges

History of Voice Recognition

Radio Rex (house trained dog), 1922U.S Department of Defense, 1940’s

Speech Understanding Research (SUR) program

Carnegie Mellon University & MIT Automatic interception & translation of Russian

radio transmissions (FAILURE) Original message: “the spirit is willing but the flesh is

weak” Translated message: “the vodka is strong but the

meat is disgusting.”

History Cont’d

First major achievements Bell Laboratories, 1952

Successful recognition of numbers 0 to 9, spoken over telephone

MIT, 1959 Successful recognition of vowels with 93% accuracy

Carnegie Mellon University, 1970’s HARPY system: capable of recognizing complete

sentences

History Cont’d

Obstacles Computing power: over 50 computers needed

for HARPY system to perform Ability to recognize speech from any person

Taking in account different accents, speech tones, etc.

Ability to recognize continuous speech so…we…do…not…have…to…speak…like…this!

Commercialization of voice recognition systems

History Cont’d

Computation required and computation available in available

processors over time

Accuracy and task complexity progress over time

Why Voice Recognition?

Convenience Natural user interface: human speech Improved services for the disabled Wider range of users

Future possibilities and improvements Internet use over phones through voice portals Advanced applications implementing voice

control in all areas

Technology behind Voice Recognition

Five major steps used by speech recognizer

Five major steps in voice recognition

Capture and Digitalization System interacts with the telephony device to capture

voice input at 8000 samples/sec Spectral Representation

Voice samples converted to graphical representation Segmentation

Speech signals are broken down into segmented parts.

Improves accuracy Reduces computation: impossible to process entire

signal in real time

Graphical Representations

Acoustic Model

Phonemes – smallest phonetic unit in a language Creates distinction between other words

e.g. b in boy and t in toy Allophone – different pronunciations of a

phoneme/letter E.g. t in tab, t in stab, tt in stutter

Database (Lexicon) of all words known to the system for a language Should contain several recordings for certain words

E.g. “the” can be pronounced “duh” or “dee”

Acoustic Model Cont’d

Trelliss Data structure made up of all possible

combinations of allophonesTraining of Acoustic models

For single-user systems Text is read by user and recognized by system

For multi-user systems Utterances spoken by many users compiled into a

database, then inputted into a recognizer Weights are put on certain allophones

Language Model

Languages have structures (i.e. grammar) Difference between two words can be difficult to

understand Can be distinguished using context

E.g. “ours” and “hours” can be determined if previous word is “two”

Common Applications

Call Center Automation Widely used in all industries (consumer interface)

Airline companies: booking flights, general info, etc. Banking companies: “pay by phone”, account

balances, etc. Delivery Services (FedEx): tracking orders, etc. All general customer service systems

Computer Integration of voice recognition Personal Computers

Speech to Text Dictation Accessibility purposes: voice control of computers

Common Applications cont’d

Integrated into automobiles: Visteon Voice

Technology™ used in Infiniti Q45

Controls: Climate CD player Navigation system

Competing Standards

VoiceXML (extensible markup language) Partners: AT&T, IBM, Motorola, Lucent Tech. Used in implementation of most voice portals Shifting target toward web developers

SALT (Speech Application Language Tags) Partners: Microsoft, Intel, Cisco, SpeechWorks Targeted toward web developers

Current Leaders

Dragon Systems: Naturally Speaking: PC based user side programs for Automated

speech recognition (ASR) Automotive, Telephony, Mobile, Games, Embedded Chips

SpeechWorks: Connects users to industry voice portals AOLByPhone, FedEx, E*Trade, etc.

BeVocal: provides voice portals for Bell South, etc. TellMe: provides voice portals for AT&T, Merrill Lynch,

etc. Philips Speech Recognition

Services automotive, mobile device, and consumer electronic industries

IBM Via Voice, MS Agent

Demonstrations

SpeechWorksTM product line United Airlines' toll free flight information line (demo) BankWorks Automated Bill Payment (demo) FedEx Rate Finder (demo) E*Trade Stock (demo) AOLbyPhone service (demo)

BeVocal solutions

Magical Merlin’s Grade Retrieval System

Designed in Visual Basic using Microsoft’s MSAgent

Menu Recognized voice commands

First Exam First Exam, First Test, First Midterm

Second ExamSecond Exam, Second Test, Second Midterm

Quiz Grades Quiz Grades, Grade on Quizzes

Homework Grades Homework Grades, Grade on Homework

Project Grade Project Grade, Grade on Project

Final Grade Final Grade, Grade for course

Main Menu Main menu, Main, Class

Click on my belly for a short demonstration

Future Challenges

Speech Technology VoiceXML vs. SALT Voice enabling web content Real time access to source data

Stock market, traffic, sports, etc.

Clear connection needed for effective use of voice portals

Security Issues involved Advertising based revenue

References

http://www.stanford.edu/~jmaurer/homepage.htm http://www.bevocal.com/corporateweb/technology/index.html http://www.speechworks.com/demos/index.cfm http://www.speechworks.com/learn/index.cfm http://www.scansoft.com/realspeak/tts2500/ http://www.out-loud.com/speechacts.html http://www.gignews.com/fdlspeech1.htm http://www.gignews.com/fdlspeech2.htm http://www.gignews.com/fdlspeech3.htm http://www.microsoft.com/msagent/default.asp


Recommended