+ All Categories
Home > Documents > Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group [email protected] Electronics and...

Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group [email protected] Electronics and...

Date post: 28-Jan-2016
Category:
Upload: corey-baldwin
View: 224 times
Download: 1 times
Share this document with a friend
23
Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group [email protected] Electronics and Information Technology Exposition - ELITEX 2005 India Habitat Centre, Lodhi Road, New Delhi. 25 - 26 April 2005
Transcript
Page 1: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Mahesh D. KulkarniC-DAC GIST [email protected]

Electronics and Information Technology Exposition - ELITEX 2005 India Habitat Centre, Lodhi Road,New Delhi.25 - 26 April 2005

Page 2: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Multimodal System (Human Computer Interface)

for Indian languages

Issues - Solutions

Page 3: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Multimodal System

• Enables users to communicate with computers via several modes such as Keyboard, OCR, Speech, Gesture, Gaze, Visual, etc.

• Major challenge for computer system designers lies in simplifying the Human Machine Interface.

• Researchers all over the world are inventing different modes of interactions, some of them with little or no success.

• No single mode is sufficient for effective communication with the machine.

• Some of the popular interaction mechanisms are

• Keyboard

• Unistroke

• Graffiti

• Predictive writing

• OCR• Speech (limited vocabulary)

Page 4: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

• Multilingual – 22 scheduled languages.

• Complex script(s) as compared to English. (especially poses problems for OCR)

• While inputting, many to one and many to many relationship unlike English.

• Limited availability of linguistic resources.

• Layman terminology versus pure linguistics terminology.

• Various dialects poses challenge for speech input

Multimodal System for Indian languagesChallenges

Impact

Possible solution

• Lack of efficient Indian language based multimodal system has put restriction on content creation.

• Need for Development of Expert /Smart writing systems backed up with Multimodal inputs, Linguistic Resources such as Spellcheckers, Grammar checker,Synonyms, Antonyms, Thesauri, Domain based Dictionaries, Phrases and references.

Page 5: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Base character - 80

Half character – 43

vowel character – 12

Matra character - 12

Hindi Language

English base Character - 26

A B C D ………

English Language

Page 6: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

• Because of unavailability of processing power, mechanical Typewriter were devised, which were based on the fact “the way you see the way you write”

• INSCRIPT - Popular and widely used & has become de-facto standard.

• Based on phonetic structure of Indian languages – “the way speak the way you write.

• And Phonetic English for Urban users

• Its very bulky, difficult to carry & large as compared to the target device itself

• Use of both hands, not suitable for portable, mobile devices

• Not possible to use without training

• More than 80 keys required with UNSHIFT / SHIFT operations

Limitations in mobile world

Indian language keyboard layout(s).

Page 7: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Virtual / LASER keyboard

• PDA’s, Cellular telephones.

• Tablet PCs, Laptops.

• Industrial, sterile & medical environments.

• Test Equipment.

• Transport (Air, Rail, Automotive).

Limitations

• Need a proper surface to display Image.

• Typing is cumbersome, since the finger positions and movements are restricted.

• Speed limitations.

Page 8: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

• KITTY,  a finger-mounted keyboard for data entry into PDA's, Pocket PC's and Wearable Computers has been developed at the University of California in Irvine.

KITTY – Keyboard Independent Touch Typing

• Two hand-mounted devices connect to the target computing device with the help of Blue tooth wireless networking technology.

• The user can type on a hard surface like a desk or table, or into the air.

© University of California and Senseboard respectively.

Page 9: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

• Each character is represented by a single stroke & hence no segmentation problem

• The system does not need to use up resources to figure out where one character ends and another begins

• No need to write characters within bounding boxes, characters can be recognized even when they are written one on top of the other. 

• Even can be used by blind person.

Unistroke Inputting

• However require the user to spend some time learning the characters.

• Complex implementation for Asian languages.

• More oriented towards English.

Limitations

Page 10: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

• Requires minimal time for learning the alphabet.

• This is all because Graffiti is easy to learn while Unistrokes is comparatively harder.

• Though Unistrokes is a faster mode for inputting text than Graffiti, nobody uses Unistrokes

Graffiti inputting

Page 11: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Non-Predictive & Predictive

Inputting mechanism for

Handheld / Mobile Devices

By

C-DAC GIST Group

Page 12: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Multitap text entry mechanism

English Hindi / Indian languages

English has 26 alphabets only.

In Hindi there are around 80 basic characters, 43 half characters, 12 vowels, 12 matras making it more than 147 characters.

These are spread over 9 keys. I.e. 3 to 4 characters on single key.

Spreading these 80 characters & half, vowels & matras over the 10 keys, it comes to around 9 to 10 characters on one key.

To get the desired character user needs to press the key up to 4 times.

It will be very cumbersome when inputting in a multi-tap way.

Since more key presses are required to get the required characters it becomes more tedious to type a bigger matter

Inputting the bigger message using this kind of mechanism for Indian languages is next to impossible.

Page 13: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

• Single character 26 combinations

• Two character 52 combinations

• Three character 4056 combinations.

Comparative study of English & Hindi

English

• Single character 80 combinations

• Two characters 6889 combinations

• Three characters 571787

Hindi

Page 14: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Multitap

• 12 keys are required to input the character

• If a character is missed out then you need to restart all over again

• Ideally suitable for less than 3-4 character per key.

• Not suitable for Indian language inputting, since almost 7-8 characters are required to be placed on each key.

(Basic character 80, half character 43, vowels 12, Matra 12)

Page 15: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Two key non-predictive

• 4 keys are required to input the character

• Any character entered in just two key press.

• Key mapping done on basis of vargas & hence easy to remember.

• Very short learning time. (3-5 minutes)

• No need to remember the keys

• Guiding reduces mistakes

• With the same keyboard layout all Indian languages can be inputted, so no need to learn again for other language.

Page 16: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Two key non-predictive

• 13 keys are required to input the character

• * Key is the mode key used for selecting halant / half character.

Technology given to MNC’s

Page 17: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Predictive writing

• This should address the need of fast inputting using limited keys.

• Should not take more than one key press per characters.

• Should help in auto completion of word, so less key press than length of the word.

• Fast searching with help of most commonly used words dictionary as a backup.

• Can manage the user-defined words also.

• C-DAC GIST has developed predictive writing for Hindi language and work in progress for others.

Page 18: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

• Because of nature of script, more complex to implement than any other language.

• “Accuracy increase” is a function of continuous development process.

• Stepwise approach to achieve good level of prediction.

• Approaches for Predictive inputting for devices

• Pure Dictionary based. • Dictionary plus rule based

approach• Addition of Domain specific

dictionaries.• Increase in accuracy by

analyzing live data & accordingly enhancing built-in dictionaries.

Page 19: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Predictive writing Demo

5 keys required to complete the word Dhanyavad

Page 20: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Page 21: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Features :• Highly efficient algorithm & automatic prediction of the frequently

used words by the user.

• Auto tracking of the frequently used words by the user & giving them priority.

• Currently 25,000 common “spoken Hindi” words.

• Addition of words by the user which are not available in the dictionary with the help of non-predictive mechanism.

• Current memory requirements• 180 KB for 25,000 words - uncompressed• 8 KB for code• 3 KB scratch memory.

Page 22: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

Conclusions• Urgent need for Development of Expert /Smart writing systems backed

up with Multimodal inputs, Linguistic Resources such as Spellcheckers, Grammar Checker,Synonyms, Antonyms, Thesauri, Domain based Dictionaries, Phrases and References.

• Standardization for inputting Indian languages through limited keys.

Page 23: Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India.

Nurturing Living Languages

© C-DAC

THANK YOU

Nurturing living languages


Recommended