Speech Recognition Technology

1. SPEECH RECOGNITION TECHNOLOGY Presented by: Nicole Bralic | Sergio Rumantir | Louis Fong | Niharika Kohli | Aamir Sheriff

2. Agenda 1. Origins / history of speech recognition 2. How it works the technical aspects 3. Issues and concerns 4. Latest trends and future opportunities 5. Activity 3. ORIGINS / HISTORY OF SPEECH RECOGNITION 4. Introduction Speech Recognition What is the first thought that comes to mind? 5. Origins When was the first Speech Recognition Software developed? a) 1950 b) 1960 c) 1970 d) 1980 6. Origins Answer: 1950s First appearance - could only understand digits 7. Origins 1960s Understood 16 words spoken in English 8. Origins 1970s Understood 1011 words 9. Origins 1980s Understood thousands of words, but still slow 10. Origins 1990s First comprehensive software Cost = $9,000 11. Origins 2000s Built into Mac OSX and Windows Vista 12. Origins 2010s Apple introduces SIRI 13. HOW IT WORKS The technical aspects 14. Small Vocabulary / Many Users Types of Users Large Vocabulary / Few Users 15. How it works 16. Speech Recognition Models Today's speech recognition systems use powerful and complicated statistical modeling systems. These systems use probability and mathematical functions to determine the most likely outcome. The two models that dominate the field today are the Hidden Markov Model and neural networks 17. The Hidden Markov Model 1. Each phoneme is like a link in a chain, and the completed chain is a word. 1. The chain branches off in different directions as the program attempts to match the digital sound with the phoneme that's most likely to come next. 1. The program assigns a probability score to each phoneme, based on its built-in dictionary and user training. 18. There are four basic steps to performing recognition: 1. Digitize the speech that we want to recognize. 2. We compute features that represent the spectral-domain content of the speech. 3. A neural network (also called an ANN, multi- layer perceptron, or MLP) is used to classify a set of these features into phonetic-based categories at each frame. 4. Viterbi search is used to match the neural- network output scores to the target words,in order to determine the word that was most likely uttered. Neural Network 19. Overall Process 20. ISSUES 21. Poll Time! Why / Why not? 22. Issue: Accuracy & Performance How accurate was the performance of Siri? What caused this lack of accuracy? 23. Issue: Accuracy & Performance Why was the accuracy and performance of Siri was low in the previous video? Background noise Overlapped speech Speakers accent Syntactic error iPhone 4S and Siri werent advanced enough 24. Issue: Accuracy & Performance Technological improvement More vocabulary, lower accuracy Perfectly recognize one to nine, but as library grows, some words becomes confusing Speaker-dependent vs speaker-independent Isolated, discontinued, continuous (natural) speech Read vs. spontaneous speech Cannot understand a sentence that is very off syntactically Adverse condition: noise, distortion 25. Issue: Accuracy & Performance Accuracy enhanced! 26. Issue: Privacy Google Chrome Your PC becomes an open microphone Wearable Technology in the workplace Should not be used to monitor employees Facebook Music and TV Recognition Is it really turned off? 27. Issue: Control As technology advances quickly, is government legislation good enough to control the proper usage of speech recognition software? Is it even possible to control? 28. LATEST TRENDS AND FUTURE OPPORTUNITIES 29. Computer software technology corporation Market leader in speech and imaging applications o server & embedded speech recognition o telephone call steering systems o automated telephone directory services o medical transcription software & systems o optical character recognition o desktop imaging Nuance Communications 30. Worlds best-selling speech recognition software For home, student, power and professional users Essential for people with visual impairments Dragon Naturally Speaking 31. Using Python to Code by Voice The beginning: Tavis Rudd developed Emacs Pinkie (RSI) Months of coding in Python and Emacs Dragon Naturally Speaking voice recognition software on Microsoft Windows Over 2000 own personal commands The code is released for download 32. Dragon Drive Nuance integrates its technology with cloud and vehicle on- board capabilities to create distraction-free driving with Dragon Drive voice command in action. Over 90 million cars are currently equipped with Nuance Dragon Drive. 33. Dragon Medical Dragon Medical provides clinical documentation solutions for over 300,000 physicians. This portfolio captures the physician narrative to document care in the EHR anywhere, any time and on any device. 34. Real-time Skype Translator Microsoft will release the first beta of real-time Skype Translator to Windows 8 before the end of 2014. They are currently implementing near real-time voice translation of multiple languages in a Skype call. Currently there is instant functional translation from English to German and Chinese. 35. Future Trends Voice recognition market to reach US$2.5 billion in revenue by 2015 Typed Passwords arent going to work in the future 36. Class Discussion What do you think the future of speech recognition technology will look like? What are some other uses of this technology? Do you think the benefits outweigh the issues? 37. CLASS ACTIVITY VOLUNTEERS? 38. THANK YOU! QUESTIONS?

Date post:	20-Jan-2015
Category:	Technology
Upload:	aamir-sheriff
View:	244 times
Download:	3 times