Post on 10-Apr-2018
transcript
8/8/2019 Instant Speech Translation - 10BM60080
1/13
INSTANT SPEECH TRANSLATION
By SATHIYASEELAN M
10BM60080
I Year M.B.A
VGSOM, IIT Kharagpur
8/8/2019 Instant Speech Translation - 10BM60080
2/13
Index
1. Abstract .................................................................................................3
2. Instant Speech Translation Eliminating Language Barriers ...........3
3. System Requirements ..........................................................................3
3.1. Speech Recognition ...............................................................................4
3.2. Language Parsing ..................................................................................5
3.3. Translation .............................................................................................5
4. Applications and their Business Potential .................... .................... ..6
4.1. Mobile Applications and Services ...........................................................6
4.2. Voice Interface Devices with Local Language support ............................8
4.3. Data Entry Applications in Multiple Languages ....................................9
4.4. e-Learning .............................................................................................9
4.5. Business Applications .......................................................................... 10
5. Key Players ................................. ..................... ..................... .............. 11
6. Challenges Ahead ..................... .................... ..................... ................. 11
7. Conclusion ..................... ..................... ..................... ..................... ...... 12
8. References ..................... ..................... ..................... ..................... ...... 13
8/8/2019 Instant Speech Translation - 10BM60080
3/13
1. Abstract
With the current pace of globalization, any Industry needs to look beyond Geographical
borders. Indian IT firms provide services to Japanese, Korean clients etc. These firms also
invest a lot on foreign language training programs. An Application that provides instanttranslation will not only cut down these costs but will also help gathering requirements more
precisely and in a short span of time. Instant speech translation [IST] finds wide applications in
other industries as well. Say in a country like India where numerous vernacular languages are
in use, IST can be used in a number of ways in day-to-day life. There is huge potential for IST
applications in mobile phones. All major players such as Google, Microsoft, and IBM have
already come up with some sort of prototype for these kind of applications. Google Translator is
one such primitive example. A lot many such applications will be in our gadgets soon. This
Paper elaborates on few such applications and their business potential.
2. Instant Speech Translation Eliminating Language Barriers
Internet and mobile services has reached even remote villages. Now rural markets are
considered significant in countries like China and India. Breaking Language barriers will further
open up these markets for international business. Knowledge anywhere in any form should be
used for the growth of the humanity. We should create opportunities for those who want to
learn and share knowledge using their own native languages. Instant Speech translation will
create a platform for them. This could unravel many things that are not known to the world.
In The Hitchhikers Guide to the Galaxy Babel fish, a f ictitious animal performs instant
translations when kept in the ear. If such an application is there on the mobile, Say I call a
person in Japan, I speak to him in English which would be translated to Japanese by the
application and then transmitted through a telecom service provider. This will eliminate
language boundaries and create a truly connected world.
3. System Requirements
We think speech-to-speech translation should be possible and work reasonably well in a few years
time. Clearly, for it to work smoothly, you need a combination of high-accuracy machine translation
and high-accuracy voice recognition, and thats what were working on .If you look at the progress in
machine translation and corresponding advances in voice recognition, there has been huge progress
recently.
- Franz Och, Googles head of translation services
To develop an Instant speech Translation application, we need a robust speech recognition
and Machine translation system. Following figure depicts the basic blocks of an instant speech
translation system.
8/8/2019 Instant Speech Translation - 10BM60080
4/13
Fig. Basic Functional Blocks of Instant Speech Translation
3.1. Speech Recognition
Advances in speech-recognition and dictation technology have made stunning leaps
forward in recent years although it isn't perfect. Word Error Rate (WER)has drastically come
down in the recent past.
Fig. Word Error Rate of Speech Recognition Systems over Years
Source -http://cacm.acm.org/Communications of the ACM
http://en.wikipedia.org/wiki/Word_error_ratehttp://en.wikipedia.org/wiki/Word_error_ratehttp://cacm.acm.org/http://cacm.acm.org/http://cacm.acm.org/http://cacm.acm.org/http://en.wikipedia.org/wiki/Word_error_rate8/8/2019 Instant Speech Translation - 10BM60080
5/13
Speech recognition has achieved good usability and there is a sudden surge in the
speech controlled devices. Even Microsoft Vista had speech recognition capabilities which
turned out to be a failure. But we had witnessed basic commands working in it. Just a listening
and guessing system is not going to thi s forward.
Robust speech recognition technology is an crucial part of Instant speech translation.
Main problem systems face is in understanding the nuance of users enunciation and voice
patterns. When used over a period of time it could reduce the speech recognition error rate.
Mobile phones will have an upper hand over gadgets in this. As a mobile phone is used by only
one user mostly and even users cant avoid mobile phone usage. Mobiles can also soon
recognise users natural free-style speech. Speech recognition systems can be customized to a
particular user by having a predefined set of commands or words to be uttered by the user.
This could help the system recognize its masters voice patterns. This could be done with the
help of a professional in early stages of development for this sort of customization.
3.2. Language Parsing
Human sentences cant be easily parsed by programs as they parse mathematical
expressions. There is substantial ambiguity associated with the structure of human language.
Some sort of linguistic analysis needs to be done to fetch the relevant information. Language
parser splits the raw text into understandable word units and selects the correct form and class
for each word that can have more than one interpretation and identifies the head words of a
sentence. The information that is analysed by the language parser is passed to the machine
translation engine for further tasks.
There should be some set of protocols defined for communication between different
languages. Say for e.g. Indian languages generally use SUBJECT-OBJECT-VERB pattern but
in English SUBJECT-VERB-OBJECT pattern is generally used. Language Parser role is provide
parsed language stream that can be easily interpreted by translators.
3.3. Translation
Machine Translator translates a parsed input language stream to a well defined output
language stream. Translation done by Machine translator will abide by the set of protocols
defined for communication between a set of languages.
8/8/2019 Instant Speech Translation - 10BM60080
6/13
8/8/2019 Instant Speech Translation - 10BM60080
7/13
Fig. A Model of IST Services on mobile
IST as a product:
Even these services can be packaged into a product. But this will be a heavy
application to support an almost perfect translation. So in the initial stages user preferred
language packs can be packed into a product and sold to the user.
Fig. Users interacting through an IST application on mobile
Service model will suit Indian languages and Product model will suit for international
languages like Japanese. Service model will facilitate wide spread of these applications and it
will also bring in various players into it.
8/8/2019 Instant Speech Translation - 10BM60080
8/13
Even IST applications can be used in other type of gadgets like iPod, iPad etc. Few
basic stuffs are already available in App store for e.g. Jibbigo Voice Translation
Fig. Screenshot of Jibbigo Application on iPod
IST Development Standards
To facilitate easy development and learning some set of standards need to be
established similar to HTML in web design. As XML and JSON for machine readable data
sharing, VOXML (Voice XML) can be used for these types of applications.
4.2. Voice Interface Devices with Local Language support
Voice interface devices that support Local languages will soon be on use. Say a
localities interacting with a railway information kiosk with their local language through speech.
Instant speech translation will play a vital role in these types of interfaces. IST Applications can
be at the front end of such devices. This will also consume lesser query resolving time as
compared to traditional key entry enquiry system. As most of the voice driven applications
currently support English. Even same is the case with Windows 7 Operating System. IST
Application when used at the front end can translate local language speech input to English
which can be further processed by Speech recognition systems supported by various Operating
Systems.
8/8/2019 Instant Speech Translation - 10BM60080
9/13
Fig. Various blocks in a Railway Information Kiosk that supports Regional Language support
through speech
4.3. Data Entry Applications in Multiple Languages
IST Applications can help in Data entry applications in multiple languages. This could
assist in translating legal documents to various languages. We have witnessed many court
proceedings getting delayed due to lack of documents in regional languages. Our Governm ent
also invests a lot in translating various documents to regional languages. In the years to come
Microsoft word will have options to view translated versions while typing. This could cut down
costs and time involved in such activities.
4.4. e-Learning
Advancement in computing and bandwidth has brought the benefits of traditional classroom
education into a distance learning environment. IST will take this a step forward by removing
language barriers that impede the sharing of ideas and knowledge. Below figure depicts the schema
of an e-classroom that uses IST.
LocalLanguage
Speech inputIST
ApplicationsCommand / Query
Generator
Normal Processingdone in a RailwayInformation Kiosk
English
8/8/2019 Instant Speech Translation - 10BM60080
10/13
Fig. IST Applications supporting Distance Learning in Various Languages
Even IST applications could be used in webcasting in a similar way.
4.5. Business Applications
IST Applications could also assist Business enterprises to interact with customers located
across different geographies. IST will help in understanding customer requirements in short
span of time.
Users contribution to IST applications is very crucial. They can provide suggestions t o
improvise the translation provided by the application. Some credits can be given to regular
users who provide valuable suggestions. This will encourage local participation, which would
ultimately help in improving the quality of service provided by IST applications.
Applications of IST discussed here is just a tip of an iceberg. We would see a lot many such
applications in future when IST applications are usable in real time. Then IST applications
could be expanded to lot many sensitive areas like Health care, defence etc.
8/8/2019 Instant Speech Translation - 10BM60080
11/13
5. Key Players
Google was the first company to announce that it was working on speech-to-speech
translation for mobile phones. The Latest Apps from Google Android that supports translation is
Babylon that will give dictionary results in 75 different languages as well as full text translationsin over 12 languages. Apple is working with IBM to roll out speech-to-speech translator for
iPhones. IBM and Apple are already working closely on a few applications that will run on
iPhone and iPad.
IBM has been working on translation software and machine translation for years. In fact,
they created MASTOR and the SMT (Statistical Machine Translation) technology that many
other Translating Applications are using.
Microsoft has inbuilt speech recognition support in its Operating systems. It has
recently demonstrated German-English translation of a conversation between two Microsoft
employees. It has made no official announcements on projects pertaining to Instant Speech
Translation.
Videos of Instant Speech Translation applications by other major players like AT&T,
NEC, ATR float in YouTube. Nespole, Babylon, Verbmobil, MATRIX etc. are few well known
speech translation systems developed by these players in this field. Extensive Research
Projects are going on to improve the usability of Speech translation systems. PDA
manufacturers could work in collaboration with these Application developers to accelerate
these projects, which would also help them in gaining an upper hand over their competitors.
6. Challenges Ahead
System that works well in real time environment will only be successful in the long run.
Numerous hurdles need to be crossed to reach a perfect real time IST. One such is Speech
Recognition with high accuracy. It is heavily dependent upon the quality of the input speech.
Acoustical degradations produced by additive noise are an obstacle to reach desired accuracy.
In a real time user is not going to use IST applications in a noise free environment. Hence IST
application should be intelligent enough to separate out the users voice form the noise in the
environment.
IST applications are also expected to be intelligent enough to capture the users mood
in the future. Monotonous voice from an IST application will soon make the user bored with
these applications. Even a customisable voice from the IST application will make them more
expressive and friendly. Adding Phonemes to computerised voice will it nearer to a human
voice.
8/8/2019 Instant Speech Translation - 10BM60080
12/13
Industry should work in collaboration with research communities in resolving these
hurdles and achieve a human like performance.
7. Conclusion
Speech/Text Translation Applications are being used in variety of forms in number of
devices. To attain humanlike performance, we must continue to invest in research. Along with
speech, other sensory user inputs can also be integrated with IST applications to attain
humanlike performance. Once that is achieved Instant speech translation will soon spread to
devices like T.V. It wouldnt be a surprise if text in the web now gets replaced by audio and
video in the future glocalworld.
8/8/2019 Instant Speech Translation - 10BM60080
13/13
8. References
1. Enhancing Global and Synchronous Distance Learning and Teaching by Using InstantTranscript and Translation By Ivan Ho, Hajime Kiyohara, Akira Sugimoto, and Kazuo
Yana Hosei. University Research Institute, California.
2. http://mashable.com/2010/02/08/speech-to-speech/
3. http://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.html
4. http://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article701783
1.ece
5. http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-
translator/
6. http://www.jibbigo.com/website/index.php
7. http://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognition
http://mashable.com/2010/02/08/speech-to-speech/http://mashable.com/2010/02/08/speech-to-speech/http://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.htmlhttp://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.htmlhttp://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://www.jibbigo.com/website/index.phphttp://www.jibbigo.com/website/index.phphttp://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognitionhttp://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognitionhttp://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognitionhttp://www.jibbigo.com/website/index.phphttp://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.htmlhttp://mashable.com/2010/02/08/speech-to-speech/