+ All Categories
Home > Documents > Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions Computer...

Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions Computer...

Date post: 13-Jan-2016
Category:
Upload: della-porter
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions http://www.cfilt.iitb.ac.in Computer Science and Engineering Department, IIT Bombay
Transcript
Page 1: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Hindi - English Synset Linkage

Resource Center for Indian Language Technology Solutions

http://www.cfilt.iitb.ac.in

Computer Science and Engineering Department, IIT Bombay

Page 2: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Outline• Introduction• Issue in linkage• Linkage Problem due to POS Mismatch• Linkage Problem due to Hindi Construction like

Causative Verb• Linkage Problem due to Idiomaticity• Linkage Problem due to Culture Specific• Linkage Problem due to MW

Page 3: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Introduction• Linkage between HWN 1.2 and EWN 2.1

• Two types of linkage:- i) Direct linkage ii) Hypernymy linkage

• Direct Linkage- The synsets having exact equivalents in English WordNet are to be linked through direct linkage.

For example, आम (aama), आम वृ�क्ष (aama vriksha) is to be linked to the English synset mango, mango tree - (large evergreen tropical tree cultivated for its large oval fruit).

Page 4: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Hypernymy Linkage

• The synsets which cannot be linked directly to English concepts are to be linked through hypernymy linkage.

For example, the Hindi synsets of चा�चा� (chaachaa, paternal uncle) and म�म� (maamaa, maternal uncle) would be linked to the English synset of uncle through hypernymy linkage.

Page 5: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Issues in Linkage• Synsets cannot be linked due to unavailability of

corresponding English synsets in EWN• Nouns For example:- आल आउट, ऑल आउट, आल आऊट, ऑल आऊट - एक पक्ष क� सभी�

खि�ल�ड़ि�यों� क� आउट हो�ने� क� ड़ि�यों�"आज भी�रती�यों ड़ि�क� ट ट#म 195 पर हो� आल आउट हो� गई”

Aala aauta-Eka paksha ke savii khiilaaDiyon ke aauta hone kii kriyaa “aaja bhaaratiiya kriketa tiima 195 para hii aala aauta ho gaii”

All out- Act of all players of a team being out “today Indian cricket team was all out at 195 ”

Page 6: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Issues in Verb Linkage• Verbs- Like the following verb are unlinkable.

For example:- स्तीब्ध हो�ने�, ड़िनेश्चे�ष्ट हो�ने�, ज� हो�ने�, बधि-र हो�ने�, बहोर� हो�ने�, बड़िहोर� हो�ने� -

स.वृ�दने�शू1न्यों हो�ने� "सब क3 छ सम�प्ती हो� चा3क� हो6 योंहो �बर स3नेकर वृहो प1र्ण8 रूप स� स्तीब्ध हो� गयों�”

stabdha honaa, nishchesta honaa, jaDa honaa, badhira honaa, baharaa honaa, bahiraa honaa _sanvedanaa shoonya honaa “saba kuchha samaapta ho chukaa hai yaha khabara sunakara vaha poorNa roopa se stabdha ho gayaa”

to be shocked – be numbed “ after listening the news that every thing is finished, he was completely shocked ”

Page 7: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Issues in Adjective Linkage• Adjective:- hypernymy linkage is not possible.

For example:- एकल - जिजसम; एक पक्ष म; क� वृल एक खि�ल��� हो� "नेडा�ल ने� एकल स्प-�8 क�

फा�इनेल म; प्रवृ�शू ड़िकयों�”

ekala- jisamen eka paksha men kevala eka khilaaDii ho“nadaala ne ekala spardha ke phainala men pravesh kiya”

Single-That which has one player in one team/side “Nadal entered in final of single competition”

Page 8: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Issues in Adverb Linkage

• Adverb:- This also cannot be linked with hypernymy.

For Example:-

क� क�रर्ण, क� वृजहो स�, क� चालती�, क� म�र� - ड़िकस� क�रर्ण स�"ती�ज़ ब�रिरशू क� क�रर्ण मC भी�ग गयों�“

Ke kaaraNa, kii vajaha se, ke chalate, ke maare- kisii kaaraNa se “teja baarisha ke kaaraNa main bhiiga gayaa”

Due to, because of, on account of (it is a conjuction or preposition in English so cannot be linked) “due to heavy rain I got wet”

Page 9: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Linking Problem due to POS Mismatch 1/4

• बर��स्ती, बरख़ा�स्ती, बरख़्वृ�स्ती, बरख्वृ�स्ती, ब��8स्ती, ड़िवृसर्जिजIती, सम�प्ती- (अधि-वृ�शूने, ब6ठक, सभी� आदिद क� स.ब.- म;) सम�प्ती ड़िकयों� हुआ यों� जिजसक� ड़िवृसज8ने हो� चा3क� हो� “बर��स्ती सभी� कल स3बहो दस बज� प3नेN प्र�र.भी हो�ग�”-Adjective

barakhaasta,visarjita,samaapta(adhiveshana,baithaka,sabhaa aadi ke sambandha men) samaapata kiyaa huaa yaa jisakaa visarjana ho chukaa ho” barakhaasta sabhaa kala subaha dasa baje punah praarambha hogii”

Adjourned- That which has ended ( not given as an adjective in English dictionaries ; given as a past participle of the verb) “adjourned meeting will re start tomorrow at ten O’clock”

Page 10: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Linking Problem due to POS Mismatch 2/4

• बने�म, क� ड़िवृरुद्ध, क� खिख़ाल�फ़, क� खि�ल�फ़, क� खि�ल�फा  - ड़िकस� क� प्रड़िती यों� ड़िवृरुद्ध   “योंहो द�वृ� म�-वृसिंसIहो बने�म ब�ने�सिंसIहो द�योंर हुआ हो6 “- Adverb

banaama, ke viruddha,ke khilaafa – kisii ke prati yaa viruddha “yaha daavaa maadhavasinha banaama beniisinha daayara huaa hai”

Versus , against -“towards or against someone. (Translates as ‘versus’ – a preposition in English) “this legal action is lodged against benisinha by madhavasinha”

Page 11: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Linking Problem due to POS Mismatch 3/4

• क� म�ध्योंम स�, क� द्वा�र�, क� ज़रिरए, क� जरिरए, क� म�फ़8 ती, क� म�रफ़ती, क� म�फा8 ती, क� म�रफाती - ड़िकस� क� द्वा�र� यों� ड़िकस� स� "मC आपक� अपने� धिमत्र क� म�ध्योंम स� क3 छ रुपए भी�ज द�ती� हूँX” Adverb

Ke maadhyama se, ke dvaaraa, kejarie, ke marfata- kisii ke dvaaraa yaa kisii se “mai aapako apane mitra ke maadhyama se kuchha rupae bheja detaa hoon”

Translates as ‘through’ in English which is a preposition Through someone- “I will send you some money through

my friend.”

Page 12: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Linking Problem due to POS Mismatch 4/4

• क� लिलहो�ज स� - क� आ-�र पर यों� क� द��ती� हुए“जनेस.ख्यों� क� लिलहो�ज स� ड़िवृश्व म; भी�रती क� दूसर� स्था�ने हो6” Adverb

ke lihaaja se – ke aadhaara para yaa ko dekhate hue janasankhyaa ke lihaaja se vishva men bhaarata ka doosaraa sthaana hai.

In terms of or as per “In terms of population India is second in the world” ‘In terms of’ not found in EWN

• ओर स�, तीरफा स� - ड़िकस� क� ओर यों� तीरफा स� "होरभीजने म3.बई क� ओर स� ��ल;ग�” Adverb

ora se, tarafa se- kissi kii ora yaa tarafa se “harabhajana mumbaii kii ora se khelenge”

On behalf of- “Harabhajan will play from Mumbai” ‘On behalf of’ not found in EWN

Page 13: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Linking Problem due to Hindi construction like causative

• बनेवृ�ने�  - द�ढ़ी# यों� ब�ल कटवृ�ने� यों� प1र� तीरहो स� ड़िनेकलवृ� द�ने�  "मCने� ने�ई स� द�ढ़ी# बनेवृ�ई“ Verb

banavaanaa- daadhii yaa baala katavaanaa yaa poorii taraha se nikalavaa denaa “mainne naaii se daadhii banavaaii”

To get a trim or shave:- To get the hair or beard cut or get it shaved completely “I got my beard shaved by the barber”

• ती3�वृ�ने�, ती��वृ�ने�, ती3��ने�, ट�रवृ�ने�, ती�रवृ�ने� - क�ई वृस्ती3 आदिद क� यों� उसक� भी�ग ती��ने� क� क�म दूसर� स� कर�ने� "म�X र�-� स� लकड़ि�यों�X ती3�वृ� रहो� हो6” Verb

tuDavaanaa, toDavaanaa, tuDaanaa,toravaanaa –koii vastu aadii ko yaa usakaa bhaaga torDane kaa kaama doosare se karavaanaa “maan raadhe se lakaDiyaan tuDavaa rahii hai”

Cause to break:-To get some object or a part of it broken by some one. “mother is getting woods broken by radhe”

Page 14: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Linking Problem due to Idiomaticity• हो�थ पर हो�थ -र� ब6ठने�, हो�थ पर हो�थ र�कर ब6ठने�, हो�थ पर हो�थ र�� ब6ठने�, ��ल� ब6ठने� -

क3 छ ने करने�, ऐस� हो� प�� रहोने�"आप हो�थ पर हो�थ -र� ब6ठ� होC इसस� क3 छ नेहोa हो�ने�वृ�ल�” Verb

haatha para haatha dhare baithanaa, khaalii bhaithanaa- kuchha na karanaa, aise hii paDe rahanaa “aap haatha para haatha dhare bhaithe hain isase kuchha nahiin honevaalaa”hand on hand hold sit, empty sit

To sit idle – not do anything (Does not form a single concept in English therefore cannot be linked) “you are sitting idle it is not going to help”

• पे�ट पे�लना� - ज6स�-ती6स� ग3जर-बसर करने�   "वृहो ड़िकस� तीरहो अपने� प�ट प�ल रहो� हो6” Verb

peta paalanaa- jaise taise gujara-basara karanaa “vaha kisii taraha apanaa peta paala rahaa hai”stomach nourish/nurture

to make ends meet - to survive somehow “he is somehow making his ends meet”

Page 15: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Linking Problem due to Culture Specific words

• क; द्र, क� न्द्र - जन्मक3. डाल� म; ग्रहो� क� पहोल�, चाdथ�, स�तीवृ�X और दसवृ�X स्था�ने"ज्यों�ड़ितीषी� ज� क; द्र क� शू3भी फालद�यों� बती� रहो� होC” Noun

kendra- janmakundalii mai grahon kaa pahalaa, chauthaa, saatavaan aura dasavaan sthaana “jyotishii jii kendra ko shubha faladaayii bataa rahe hain

Centre:- first ,fourth,seventh and tenth place of the planets in the horoscope “Astrologer is telling that centre is fruitful or well productive”

• ने�गप3र� - ने�गप3र क� यों� ने�गप3र स� स.ब.धि-ती   "ने�गप3र� स.तीर� प्रलिसद्ध होC” Adjective naagapurii- naagapura kaa yaa naagapura se sambandhita “naagapurii

santare prasiddha hain” Nagpuri-of or related to Nagpur “Nagapuri oranges are famous”

Page 16: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Linking problem due to MW

• ��ती�द�र, ��ती� -�रक, अक�उ.ट हो�ल्डर, एक�उ.ट हो�ल्डर, अक�उन्ट हो�ल्डर, एक�उन्ट हो�ल्डर - ��ती� ��लने� वृ�ल� व्यलिj "��ती�द�र क� ��ती� म; कम स� कम एक होज़�र रुपए अवृश्यों हो�ने� चा�ड़िहोए” Noun

khaatedaara, khaataa dhaaraka – khaataa kholane vaalaa vyaktii “khaatedaara ke khaate men kama se kama eka hajaara rupae avashya hone chaahie.

Account Holder- The person who opens an account “The account holder must have at least one thousand rupees in his/her account.”

Page 17: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Linking problem due to MW contd..

• अड़िवृश्व�स प्रस्ती�वृ - सरक�र क� पर�जिजती यों� कमज�र करने� क� उम्म�द स� ड़िवृपक्ष क� द्वा�र� यों� शू�योंद हो� कभी� तीत्क�ल�ने समथ8क� द्वा�र� स.सद क� स�मने� प�र.परिरक रूप स� र�� गयों� एक स.सद#यों प्रस्ती�वृ "ड़िवृपक्षिक्षयों� ने� सरक�र क� स�मने� अड़िवृश्व�स प्रस्ती�वृ र�� हो6” Noun

avishvaasa prastaava- sarakaara ko paraajita yaa kamajora karane kii ummiida se vipaksha ke dvaaraa yaa shaayada hii kabhii tatkaaliina samarthakon dvaaraa sansada ke saamane paarampaarika roopa se rakhaa gayaa eka sansadiiya prastaava “vipakshiyon ne sarakaara ke saamane avishvaasa prastaava rakhaa hai”

No confidence motion-A motion of non-confidence (alternatively vote of non-confidence, censure motion, no-confidence motion, or confidence motion) is a parliamentary motion traditionally put before a parliament by the opposition in the hope of defeating or weakening a government, or, rarely by an erstwhile supporter who has lost confidence in the government. The motion is passed or rejected by means of a new parliamentary vote (a vote of non-confidence).

Page 18: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Computational issues

Page 19: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

MIGRATING FROM PWN 2.1 TO PWN 3.0

Issue:1. PWN 3.0 has a larger pool of concepts than PWN

2.12. Hence in several cases linking to PWN 3.0 will be

easier Proposed Solution:

1. Migration of linked synsets from PWN 2.1 to PWN 3.0

2. Upgrading the PWN database in Hindi-English WordNet Linking tool from 2.1 to 3.0

Page 20: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

PROBLEMS DUE TO POS MISMATCH AND SOME IDIOMATIC CASES

Issue: POS Mismatch – Corresponding sense in English

may have a different POS Idioms – In particular cases the corresponding

sense in English may be available but may have a different POS

Proposed Solution: If linking is allowed across POSs, then the tool can

be accordingly adjusted to link across POSs

Page 21: Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions  Computer Science and Engineering Department,

Thank You


Recommended