Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 2 times |
Overview
History of voice recognition Why voice recognition? Technology behind voice recognition
Five major steps Common applications Current leaders
Demonstrations Product Evaluation
Implementation of our own voice recognition system Grade retrieval system for EE3414
Future Challenges
History of Voice Recognition
Radio Rex (house trained dog), 1922U.S Department of Defense, 1940’s
Speech Understanding Research (SUR) program
Carnegie Mellon University & MIT Automatic interception & translation of Russian
radio transmissions (FAILURE) Original message: “the spirit is willing but the flesh is
weak” Translated message: “the vodka is strong but the
meat is disgusting.”
History Cont’d
First major achievements Bell Laboratories, 1952
Successful recognition of numbers 0 to 9, spoken over telephone
MIT, 1959 Successful recognition of vowels with 93% accuracy
Carnegie Mellon University, 1970’s HARPY system: capable of recognizing complete
sentences
History Cont’d
Obstacles Computing power: over 50 computers needed
for HARPY system to perform Ability to recognize speech from any person
Taking in account different accents, speech tones, etc.
Ability to recognize continuous speech so…we…do…not…have…to…speak…like…this!
Commercialization of voice recognition systems
History Cont’d
Computation required and computation available in available
processors over time
Accuracy and task complexity progress over time
Why Voice Recognition?
Convenience Natural user interface: human speech Improved services for the disabled Wider range of users
Future possibilities and improvements Internet use over phones through voice portals Advanced applications implementing voice
control in all areas
Five major steps in voice recognition
Capture and Digitalization System interacts with the telephony device to capture
voice input at 8000 samples/sec Spectral Representation
Voice samples converted to graphical representation Segmentation
Speech signals are broken down into segmented parts.
Improves accuracy Reduces computation: impossible to process entire
signal in real time
Acoustic Model
Phonemes – smallest phonetic unit in a language Creates distinction between other words
e.g. b in boy and t in toy Allophone – different pronunciations of a
phoneme/letter E.g. t in tab, t in stab, tt in stutter
Database (Lexicon) of all words known to the system for a language Should contain several recordings for certain words
E.g. “the” can be pronounced “duh” or “dee”
Acoustic Model Cont’d
Trelliss Data structure made up of all possible
combinations of allophonesTraining of Acoustic models
For single-user systems Text is read by user and recognized by system
For multi-user systems Utterances spoken by many users compiled into a
database, then inputted into a recognizer Weights are put on certain allophones
Language Model
Languages have structures (i.e. grammar) Difference between two words can be difficult to
understand Can be distinguished using context
E.g. “ours” and “hours” can be determined if previous word is “two”
Common Applications
Call Center Automation Widely used in all industries (consumer interface)
Airline companies: booking flights, general info, etc. Banking companies: “pay by phone”, account
balances, etc. Delivery Services (FedEx): tracking orders, etc. All general customer service systems
Computer Integration of voice recognition Personal Computers
Speech to Text Dictation Accessibility purposes: voice control of computers
Common Applications cont’d
Integrated into automobiles: Visteon Voice
Technology™ used in Infiniti Q45
Controls: Climate CD player Navigation system
Competing Standards
VoiceXML (extensible markup language) Partners: AT&T, IBM, Motorola, Lucent Tech. Used in implementation of most voice portals Shifting target toward web developers
SALT (Speech Application Language Tags) Partners: Microsoft, Intel, Cisco, SpeechWorks Targeted toward web developers
Current Leaders
Dragon Systems: Naturally Speaking: PC based user side programs for Automated
speech recognition (ASR) Automotive, Telephony, Mobile, Games, Embedded Chips
SpeechWorks: Connects users to industry voice portals AOLByPhone, FedEx, E*Trade, etc.
BeVocal: provides voice portals for Bell South, etc. TellMe: provides voice portals for AT&T, Merrill Lynch,
etc. Philips Speech Recognition
Services automotive, mobile device, and consumer electronic industries
IBM Via Voice, MS Agent
Demonstrations
SpeechWorksTM product line United Airlines' toll free flight information line (demo) BankWorks Automated Bill Payment (demo) FedEx Rate Finder (demo) E*Trade Stock (demo) AOLbyPhone service (demo)
BeVocal solutions
Magical Merlin’s Grade Retrieval System
Designed in Visual Basic using Microsoft’s MSAgent
Menu Recognized voice commands
First Exam First Exam, First Test, First Midterm
Second ExamSecond Exam, Second Test, Second Midterm
Quiz Grades Quiz Grades, Grade on Quizzes
Homework Grades Homework Grades, Grade on Homework
Project Grade Project Grade, Grade on Project
Final Grade Final Grade, Grade for course
Main Menu Main menu, Main, Class
Click on my belly for a short demonstration
Future Challenges
Speech Technology VoiceXML vs. SALT Voice enabling web content Real time access to source data
Stock market, traffic, sports, etc.
Clear connection needed for effective use of voice portals
Security Issues involved Advertising based revenue
References
http://www.stanford.edu/~jmaurer/homepage.htm http://www.bevocal.com/corporateweb/technology/index.html http://www.speechworks.com/demos/index.cfm http://www.speechworks.com/learn/index.cfm http://www.scansoft.com/realspeak/tts2500/ http://www.out-loud.com/speechacts.html http://www.gignews.com/fdlspeech1.htm http://www.gignews.com/fdlspeech2.htm http://www.gignews.com/fdlspeech3.htm http://www.microsoft.com/msagent/default.asp