+ All Categories
Home > Documents > Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese &...

Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese &...

Date post: 01-Jan-2016
Category:
Upload: emery-sharp
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah Dept. of Computer Science & Information Technology Gauhati University
Transcript
Page 1: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo

by

Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah Dept. of Computer Science & Information Technology Gauhati University

Page 2: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

INTRODUCTION

NE Wordnet Project for Assamese & Bodo started in 2009.

NE Wordnet Project for Assamese & Bodo have been developed with expansion approach with the Original Hindi Wordnet structure against the IDs and Concept of Hindi Wordnet.

Page 3: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

NE Wordnet Development Project outcomes till now

Validation

All the Assamese and Bodo Wordnet activities have been reviewed by the Professors of the Department of Assamese, Modern Indian Language & Bodo of Gauhati University as well as other invited resource persons.

Page 4: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Contd….

The developed NE Wordnet structured in the form of Database, integrated with interactive Interface is ready for different NLP research and Development. Different NLP application and research related works already started using the NE WordNet.

Automatic Bilingual Dictionary Construction: Assamese-Bodo Dictionary Construction : Prototype developed at Gauhati University.

Web based Automatic Multilingual Dictionary Construction: Assamese-Bodo-Nepali-Hindi-English Dictionary Construction: Full Web based System ready: By Gauhati University Team.

Intelligent Document Categorizing System: Prototype Developed and Tested at Gauhati University: Research Paper already accepted for GWA-2010.

Page 5: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

NE Wordnet Development Project outcomes till now

Following are the glosses which are completed in Assamese language till now:

common Synset completed =   11579 Pan Indian Synset all Completed Universal Synset (Total= 7168) completed = 7147 Adjective Synset Completed = 2376 (Total = 3605) Adverb Synset Completed = 174 (Total= 209) Verb Synset Completed = 1588 (Total = 1798) Language Specific completed = 127 (Total =1000)

Total linked Number =24,338

Page 6: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

NE Wordnet Development Project outcomes till now

Following are the glosses which are completed in Bodo language till now:

common synset completed  = 11522

Pan Indian synset all Completed

Universal Synset (Total= 7168) completed   = 7143

Adverb Synset  Completed = 192 (Total= 209)

Adjective Synset Completed = 2473 (Total = 3605)

Verb Synset Completed = 1752 (Total = 1798)

Synset Ranker = 34264 (34378)

Language Specific = 74

Total linked number = 24,493

Page 7: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Problems Faced During Development for Assamese

Synset related: In common synsets of Assamese and Bodo, a few number of synsets do not have proper Assamese word to represent. So they are not entered yet. Those left synsets have been send to the expert committee to review.

Expansion from Hindi/English: The main challenge in expansion approach is in one to one mapping.

Page 8: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Problems Faced During Development for Bodo

Challenges in Expansion Bodo is a developing language. It does not have a very strong

linguistic resource. Also literature resource is very limited. The language does not have enough vocabulary, and new and new words are being discovered, coined and added. As a result, the development of Bodo Wordnet faces typical and frequent problems, and overcoming the problems to accommodate expansion of the Hindi Wordnet with one to one mapping has been a big challenge

Page 9: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Workshop/conference organized and participated by the member groups:

1.Global Wordnet Conference in IIT, Mumbai from 31st Jan.-4th Feb. 2010

2. Indo Wordnet Conference in Amrita University, Coimbatore, in June, 2010

3. NE Wordnet Workshop, Guwahati, Assam, 2010

4. Indo Wordnet Workshop, IIT Kharagpur, 2010

5.Attended Spell checker training, C-DAC Pune, 2010

6. Indo Wordnet Workshop, Shillong, 2011

7. CLIA developers workshop, C-DAC Pune, 2011

8. Multiword Expression Workshop, University of Kashmir, Srinagar, 2011

Page 10: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Tools, Applications & Research

During this period, language specific tools have been developed.

Language specific Synset creation tools interface

Page 11: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Multi_lingual_dictionary[Online Bodo, Assamese and Hindi Language]:

Step1:

First select the language

Page 12: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Step2:

Type the word of the language

Page 13: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Step3:

When word automatically come then select the word

Page 14: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Step4:

After search the word

Page 15: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Published paper in conferences/journals/workshop

1. A Novel Approach for Document Classification using Assamese WordNet, Jumi Sarmah, Navanath Saharia and Shikhar K. Sarma, Global Wordnet Conference (GWC), Japan, 2012

2. Assamese Vocabulary and Assamese Wordnet Building: An Analysis, Shikhar Kr. Sarma, Utpal Saikia, Mayashree Mahanta, Himadri Bharali, Global Wordnet Conference (GWC), Japan, 2012

3. Foundation and Structure of Developing an Assamese Wordnet, Shikhar Kr. Sarma, Moromi Gogoi, Rakesh Medhi, Utpal Saikia, Global Wordnet Conference, IIT Bombay, 2010

4. A Wordnet for Bodo Language: Structure and Development, Shikhar Kr. Sarma, Moromi Gogoi, Biswajit Brahma, Mane Bala Ramchiary, Global Wordnet Conference, IIT Bombay, 2010

Page 16: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Published paper in conferences/journals/workshop

5. A Novel Approach for Document Classification using Assamese WordNet, Jumi Sarmah, Navanath Saharia and Shikhar K. Sarma, Global Wordnet Conference (GWC), Japan, 2012

6. Assamese Vocabulary and Assamese Wordnet Building: An Analysis, Shikhar Kr. Sarma, Utpal Saikia, Mayashree Mahanta, Himadri Bharali, Global Wordnet Conference (GWC), Japan, 2012

7. Foundation and Structure of Developing an Assamese Wordnet, Shikhar Kr. Sarma, Moromi Gogoi, Rakesh Medhi, Utpal Saikia, Global Wordnet Conference, IIT Bombay, 2010

8. A Wordnet for Bodo Language: Structure and Development, Shikhar Kr. Sarma, Moromi Gogoi, Biswajit Brahma, Mane Bala Ramchiary, Global Wordnet Conference, IIT Bombay, 2010

Page 17: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Contd…

9. Kinship Terms in Assamese Language, Shikhar Kumar Sarma, Utpal Saikia, Mayashree Mahanta, Indo Wordnet Workshop, IIT, Kharagpur, 2010

10. Formation of Kinship Terms in Bodo Langauge, Shikhar Kr. Sarma, Biswajit Brahma, Mane Bala Ramchiary, Indowordnet Workshop, IIT Kharagpur, 2010

11. Architecture of a Spell Checker for An Indo-Aryan Language: Assamese, Gogoi, Ambeswar. Shikhar Kr. Sarma and Kishore Baishya, International journal of Computational Linguistics, Volume (1): Issue (1), 2009

12. A case study of Dictionary Annotation As A Pre-procesing task to develop Assamese Spell checker, Ambeswar Gogoi and Kishore Baishya, Making of Electronic Dictionary, Linguistic Data Consortium for Indian Languages, CIIL Mysore, 2009

Page 18: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

Conclusion

Integration and collaboration of the man powers in the field of Linguistics and Computing; Trained man power development in the field of NLP, Local Language Technology Development.

Through this project a new breed of researchers in language technologies have been trained for proper skills and knowledge sets. As in these local languages the linguistic and Literature studies in formal education are with minimum computational linkage, and with no training/exposure for interlinking of linguistics and computing, the project facilitates in developing a team of interdisciplinary researchers. The project has contributed in expertise development and awareness creation in latest in machine translation, lexical semantics, cross lingual IR etc. in specific.

Page 19: Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.

THANK YOU


Recommended