of 12
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
1/12
TITLE OF THE PAPER
EILMT: A Pan-Indian Perspective in Machine Translation
AUTHORSHemant Darbari, Executive Director, C-DAC, Pune, [email protected]
Anuradha Lee, !rou" Co-ordinator, C-DAC, Pune, ee@cdac,in
A"aru"a Da#$u"ta, Team Co-ordinator, C-DAC, Pune, a"aru"[email protected]%an&a 'ain, Pro(ect Leader, C-DAC, Pune, "ri%an&a(@cdac.in)arvanan, Amrita *niver#it%, #ar+an#ter@$mai.com
ABSTRACTTo cut-acro## the an$ua$e barrier and to encoura$e the an$ua$e "urai#m o mor"hoo$ica% com"ex
an$ua$e# )"roat //0, e#"ecia% )outh-A#ian an$ua$e# 1ri#hnamurti et a. /230 in India, a
con#ortium mode robu#t Machine Tran#ation #%#tem 4MT)5 that i# abe to rai#e the accurac% o $eneration
i# deveo"ed (oint% b% C-DAC, Pune and DIT, !6I. In 7atura Lan$ua$e Proce##in$ 47LP5 and 7aturaLan$ua$e *nder#tandin$ 47L*5, Machine Tran#ation "a%# a vita roe in toda%8# India or an% #ort o e-
an$ua$e "roce##in$ and under#tandin$ b% machine. In each o the 9uarter o eectronic era o a muti-
in$ua communit% machine tran#ation, inormation retrieva or #"eech "roce##in$ become# obi$ator%.
Thi# "a"er "ro"o#e# to de#cribe a h%brid ba#ed machine tran#ation #%#tem rom En$i#h to Indian
an$ua$e#. Thi# "a"er a#o "ro"o#e# the TA! ba#ed memor% mana$ed Machine Tran#ation )%#tem 'o#hiet a. /20 ai$nin$ +ith other rue ba#ed, exam"e ba#ed and #tati#tica ba#ed Machine Tran#ation )%#tem
or En$i#h-Hindi, En$i#h-*rdu, En$i#h-6ri%a, En$i#h-an$a, En$i#h-Marathi and En$i#h-Tami.
EILMT ha# e#"ecia% been de#i$ned to tran#ate in "atorm inde"endent modue#. Thi# i# a "ro"o#edh%brid ba#ed thin-cient;thic&-#erver de#i$n< +here u#er# 4cient#5 o thi# #%#tem u#e a #tandard bro+#er to
acce## the tran#ation #ervice# o the #erver. =e ca thi# a# a Pan-Indian "er#"ective on Machine
Tran#ation. In thi# "a"er, +e +i ex"ain the chaen$e# aced and #oution dra+n at the variou# eve# o
architecture, an$ua$e and in$ui#tic com"utation. =hie buidin$ the Machine Tran#ation )%#tem, +ehave ta&en care o the #"eed and accurac% o #%ntactica% and mor"hoo$ica% diver#iied an$ua$e# at
moduar and "ha#e# o EILMT #%#tem.
>
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
2/12
1.0 Introduction to Machine TransationIn thi# "re#ent "a"er, +e +i ex"ain the chaen$e# encountered to co"e +ith the #"eed and accurac% o
#%ntactica% and mor"hoo$ica% diver#iied an$ua$e# te#ted and deveo"ed or Machine Tran#ation
#%#tem ba#ed on con#ortium mode or En$i#h to Indian Lan$ua$e# in coaboration +ith C-DAC, Pune andDIT, !ovt. o India.
In 3?/ the idea o Machine Tran#ation evoved, +hen ene De#carte# "ro"o#ed *niver#a Lan$ua$e. In/B, the !eor$eto+n ex"eriment 4/B5 invoved u%-automatic tran#ation o #ixt% u##ian #entence#
into En$i#h. In ate /2>#, machine tran#ation incined to #tati#tica mode# and exam"e ba#ed mode#
evoved $radua%. And the Machine Tran#ation #%#tem i&e )%#tran u#ed b% Ata!i#ta #earch en$ine,METE6 u#ed at the Canadian Meteoroo$ica Centre, Exam"e-ba#ed machine tran#ation "ro"o#ed b%Ma&oto 7a$ao and #evera other H%brid ba#ed Machine Tran#ation #%#tem came into exi#tence. Durin$ the
%ear //>-/, DIT 4De"artment o Inormation Technoo$%5 o !overnment o India initiated the TDIL
4Technoo$% or Deveo"ment o Indian an$ua$e#5 "ro(ect to encoura$e the Indian an$ua$e "roce##in$ in
the area o IT. The in#titution# name%, C-DAC, Pune 4MA7TA5< 7C)T 4no+ C-DAC, Mumbai<MATA5< IIIT-H%derabad 4Anu#aara&a, and )HA1TI5 and IIT-1an"ur 4An$abharati5 have ta&en the
Machine Tran#ation )%#tem rom En$i#h to Hindi to $reater hei$ht b% deveo"in$ a""ication# u#in$
cuttin$ ed$e technoo$%.
".0 Introduction to EILMTTo overcome the an$ua$e barrier and to encoura$e the an$ua$e "urai#m o mor"hoo$ica% com"ex
an$ua$e# )"roat //0, e#"ecia% )outh-A#ian an$ua$e# 1ri#hnamurti et a. /230 in India, a
con#ortium mode robu#t Machine Tran#ation #%#tem 4MT)5 that i# abe to rai#e the accurac% o $enerationi# deveo"ed (oint% b% C-DAC, Pune and DIT, !ovt. o India. It i# domain #"eciic Machine Tran#ation
#%#tem rom the domain o touri#m. Thi# "ro(ect i# deveo"ed b% > con#ortium in#titute#: C-DAC,
Mumbai, IIIT-H%derabad, II)c-an$aore, IIT-omba%, 'adav"ur *niver#it% 1o&ata, Amrita *niver#it%
Coimbatore, IIIT-Aahabad, ana#thai id%a"eeth ana#thai, *t&a *niver#it% hubane#h+ar andC-DAC, Pune bein$ the con#ortium eader. EILMT i# a h%brid ba#ed Machine Tran#ation #%#tem +ith
TA! ormai#m 4Tree Ad(oinin$ !rammar ba#ed MT deveo"ed b% C-DAC, Pune5, )MT 4#tati#tica ba#ed
MT deveo"ed b% C-DAC, Mumbai5, A7AL!E7 4ue ba#ed MT b% IIIT-H%derabad5 and EMT
4Exam"e ba#ed #%#tem deveo"ed b% II)c, an$aore5. To mea#ure the "erormance o aorementionedtran#ation en$ine# and evauate the an$ua$e "air +i#e tran#ation accurac%, +e re"re#ent here the interna
te#tin$ carried out b% con#ortia and eedbac& on the te#t-re"ort "rovided b% the !I)T !rou", C-DAC, Pune
on EILMT a"ha ver#ion . )ee EILMT Pro$re## e"ort, ?>>/0. The tran#ation out"ut accurac% o eacho the#e aorementioned tran#ation en$ine# are $iven beo+. oo+in$ tabe data i# the avera$e #core o en$ine or each #entence #tructure t%"e:
Sentence Structure t#$e Ana%en &'( EBMT &'( SMT &'( TA% &'(
Co"ua 22.F B?.> 33.? 2F.>
)im"e 2.F 3>.>> F.> /.>>
A""o#itiona F.> BF.> 3F.> /.?
eative Cau#e F.>> G.F .>> /.>>
That-Cau#e 3?.> .? 3>.>> /?.>
=h-Cau#e 3.33 G2.GG G.F 3G.GG
Co-ordinate 3.>> G?.> F2.F /.>>
Conditiona B2.> 3.? 3G.F FF.>
PP Initia 3>.>> BG.F F.> /G.F
Adverb Initia 2>.>> G.GG G.GG /.>>
!erundia G3.>> G.3> F.>> 2G.GG
Partici"e 2.? ?.>> 2.? /.?
Ininitive >.>> B3.? 3>.>> /G.F
Di#cour#e Connector F>.>> .>> F.>> F.>>
Ta)e 1: En$ine +i#e Tran#ation out"ut accurac%
)imiar%, an$ua$e "air +i#e tran#ation accurac% on TA! tran#ation en$ine +a# evauated, +ho#ea""roximate tran#ation accurac% i# a# oo+: or En$i#h-Hindi "air the tran#ation accurac% i#
http://en.wikipedia.org/wiki/Georgetown-IBM_experimenthttp://en.wikipedia.org/wiki/Russian_languagehttp://en.wikipedia.org/wiki/English_languagehttp://en.wikipedia.org/wiki/Georgetown-IBM_experimenthttp://en.wikipedia.org/wiki/Russian_languagehttp://en.wikipedia.org/wiki/English_language
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
3/12
a""roximate% 2< or En$i#h-*rdu i# a""roximate% F< or En$i#h-6ri%a i# a""roximate% 2>< or
En$i#h-an$a i# a""roximate% F>< or En$i#h-Marathi i# a""roximate% 3< and or En$i#h-Tami i#
a""roximate% F>.
*.0 Introduction to EILMT Architecture+ The ChallengesEILMT i# a +eb-ba#ed Machine Tran#ation #%#tem #oution +ith a h%brid a""roach acro## #ix an$ua$e-
"air# rom En$i#h to Hindi, *rdu, 6ri%a, an$a, Marathi and Tami. Aon$ +ith our dierent machinetran#ation en$ine#, the 7amed Entit% eco$nier 7E0 and =ord )en#e Di#ambi$uation =)D0 modue#are deveo"ed b% IIT, Mumbai. EILMT #%#tem architechture i# re"re#ented in the oo+in$ dia$ram:
,ia-ra 1: EILMT #%#tem architechture
a#ic #%#tem aciitation# o EILMT con#ortium are: *#er Lo$ modue< Pre-Proce##in$ modue< our Tran#ation En$ine#: Ana!en, EMT, )MT J TA! or #ix an$ua$e "air#< Po#t-Proce##in$ modue<
Coation and an&in$ modue< a com"atibe #%#tem +ith =GC< and ro+#er com"atibiit% or IE, Moia,
ireox, !oo$e Chrome, A""e )aari J 6"era. 4)ee Annexure 1 B or detaied EILMT #%#tem
#"eciication#5.
*.1 O/era Architecture o EILMTEILMT i# a +eb ba#ed tran#ation #%#tem acce##ed #imutaneou#% +ith muti"e u#er# and re9ue#t#. 'o##
i# the a""ication #erver +ith robu#t databa#e I;6 ie;exe and or ra"id, tran#actiona, #ecure and "ortabea""ication EILMT i# #u""orted b% E' 4Enter"rie 'ava ean5. EILMT i# de#i$ned on the ine o
centraied de#i$n +here Internet cient# #ubmit their document# to a muti-core #erver +here the "ar#in$and $eneration i# a #"a+nin$ o muti-threaded embeddin$. )i$niicant%, the outer a%er thread connect# to
A7AL!E7 en$ine 4im"emented in PEL on Linux "atorm5, another thread +ith )MT +ith #erver andthe other +ith EMT en$ine. And the an&in$ modue coate# and ran& the tran#ation rom the above
mentioned tran#ation en$ine#.
Initia% 7E +hich i# u#ed in )MT #%#tem deveo"ed b% C-DAC, Mumbai oo+ed a Maximum Entro"%
a#ed A""roach. Thi# #%#tem had an accurac% o 2.GG on ConLL-?>>G data#et. 4Preci#ion: 2G.2,eca: F2./ -Mea#ure: 2.GG5. The current #%#tem u#e# t+o #ta$e#: )M# oo+ed b% MEMM#.
*#in$ ? "ha#e#, im"roved the accurac% to /G 4Preci#ion: /?.3 eca: /G.B2 -Mea#ure: /G5.
*." I$eentation o TA% ForaisTree Ad(oinin$ !rammar 1roch and 'o#hi, /20 i# im"emented or a 3 an$ua$e-"air# in EILMT on
TA! tran#ation en$ine. The 'AA ba#ed TA! "ar#er tran#ate# En$i#h document# to Hindi, *rdu, 6ri%a,an$a, Marathi and Tami. The #i$niicant eature o thi# "ar#er i# incrementa "ar#er that identiie# the 4a5cau#e or "hra#e on the ba#i# o "robabe decarative cau#e boundar% and, 4b5 ater identi%in$ cau#e
boundar% the TA! tree derivation #tructure identiie# "robabe "arent derivation to the neare#t chid
derivation #tructure to $ive the ina inte$rated derivationa tree to the TA! !enerator. The TA! en$ine i#
enriched in #uch a +a% that it can "roce## the "ar#in$ and $eneration or interro$ative #entence#, ne$ation,$erundia con#truction, reative cau#e con#truction, and "a#t J "ro$re##ive "artici"e etc. The "re-
"roce#in$ i# controed b% #u"ervi#ed modue# #uch a# #%ntactic TA! tree di#ambi$uator modue +ith
o"timied code and databa#e-de#i$n +ritten in re$uar ex"re##ion#. Con#ider the oo+in$ de#cri"tion o
?
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
4/12
the incrementa "ar#er that ha# $iven moduarit%, exten#ionait% and #"eed in the tran#ation "roce## o TA!
en$ine. Probabiit% o ad(oinin$ the "arent derivation# to a neare#t "robabe chid derivation i# $iven b% the
oo+in$ e9uation:
K c4N5O < =here, N 7umber o Chid derivation#, K 7umber o Par#er Derivation#, c Combination ,
Con#ider the #entence “The 2th centur% harat"ur-ird-)anctuar%, +hich i# a#o &no+n a# the 1eoadeo-!hana-7ationa Par&,i# amou# a# the mo#t im"ortant bird breedin$ and eedin$ habitat o the +ord.”
oo+in$ i# the "ar#e derivation o cau#e 4one o the cau#e5:
,ia-ra "+ Par#er Derivation o cau#e
oo+in$ i# the com"ete !enerated derivation 4or derived Tree5:
,ia-ra *: Com"ete !enerated derivation
The "re#ent "a"er re"re#ent# our interna ana%#i# and te#tin$ o touri#m cor"u# on TA! tran#ation en$ine
throu$h an accurac% $ra"h ba#ed on the $rade #cae o B "oint# #ub(ect to #%ntactica% and #emantica%
+e-ormedne## and choice o "ro"er exica #eection.
,ia-ra : Accurac% !ra"h or TA! En$ine
.0 Lin-uistic ,ias$ora o EILMT+ Mor$hoo-ica# ,i/ersiied Lan-ua-esThe En$i#h cor"u# o ,?>> #entence# rom touri#m domain +ere coected, or$anied, vetted and ai$ned
)incair, '. // and ?>>B0 or a 3 an$ua$e-"air#.
India bein$ a Lin$ui#tic Area #ee 1ri#hnamurti et a, /230 in )outh-A#ian #ub-continent, both Indo-Ar%an
4ea#tern and +e#tern Indo-Ar%an5 and Dravidian an$ua$e amiie# +ith rich mor"hoo$ica herita$e have it##e"arate di#tinct in$ui#tic identit% at #ource-tar$et TA! $rammar, tran#er $rammar 4a #ource-tar$et in&
$rammar5, rue-normaier, rue# or mor"hoo$ica ana%#i# and #%nthe#i# and tran#iteration and t%"in$-too
G
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
5/12
rue. The #t%i#tic trend ob#erved in EILMT touri#m cor"u# i#: #im"e #entence 4B./B re9uenc% o
occurrence5, co"uative con#truction 4G.B/ re9uenc% o occurrence5, co-ordinate #entence# 4?>.>
re9uenc% o occurrence5, a""o#itiona #entence# 4.GG re9uenc% o occurrence5, variou# decarative
cau#e #tructure# 4?? re9uenc% o occurrence5, $erundia con#truction# 4.G re9uenc% o occurrence5,conditiona #entence# 4 re9uenc% o occurrence5, di#cour#e connector 4>.FF re9uenc% o occurrence5
and ininitiva #entence# 4/.>G re9uenc% o occurrence5. Thu#, the "arae cor"u# created or a 3
an$ua$e-"air# and eature# #uch a# intei$ibiit%, com"rehen#ibiit% and uenc% in tran#ation are maintained
to #et a reerence to the machine out"ut, a# E.M. En9ue#t ha# #aid ver% correct% that Pro"er +ord# in "ro"er "ace# create# #t%e#Q.
In 7atura Lan$ua$e Proce##in$ 47LP5 and 7atura Lan$ua$e *nder#tandin$ 47L*5 Terr% Patten, /20,Machine Tran#ation "a%# a vita roe in Indian #ub-context or e-an$ua$e "roce##in$ b% machine. In En$-
Hindi EILMT #%#tem, the ocaiation o in$ui#tic "ecuiaritie# o Hindi #uch a# obi9ue ormation,
er$ativit%, mar&ed-$ender #%#tem, ca#e-mar&in$, direct-obi9ue "uraiation etc. are handed in a controed
environment throu$h mor"h-#%nthe#ier, inite and non-inite $enerator#, P6) conver#ion rue etc. )imiar%,or other Indo-Ar%an an$ua$e "air# i.e., En$-*rdu, En$-6ri%a, En$-an$a and En$-Marathi the in$ui#tic
eature# #uch a#, Per#o-Arabic and Indic "uraiation #%#tem, exico-#emantic "ecuiaritie#, co"ua dro",
dro""in$ o exi#tentia #ub(ect, "o#t-"o#ition #%nthe#i#, #%nthe#i# o ca#e-mar&in$, em"hatic citic ormation,
u#a$e o ca##iier, verb root ateration, #tron$ and three eve $ender #%#tem, $ender ba#ed noun #%nthe#i#,and com"oundin$ etc., hattachar%a, T et. a //3< 1ri#hnamurti, h. et a /23< )e&ir& /2?< =iiam#
/20 are incor"orated throu$h eature-ba#ed exicon, ordered rue-ba#ed normaier etc. A""roximate%GF,>>> biin$ua exicon, ?>>> "hra#a exicon, /F TA! tree di#ambi$uation rue, ?-> #ource TA! tree#,
?-?G> tar$et TA! tree#, 2>> tran#er $rammar ma""in$ and F>-F mor"h-#%nthe#i# rue are deveo"ed or each Indo-Ar%an an$ua$e-"air#.
oo+in$ #ection +i ex"ain the in$ui#tic chaen$e# aced and an$ua$e com"utin$ #oution# dra+n in
EILMT #%#tem or #%ntactica% and mor"hoo$ica% diver#iied and com"ex an$ua$e# to rai#e the #"eedand tran#ation accurac%:
.1 Raisin- Transation S$eed and Accurac#+ Interediate Soutionsa(. ecti%in$ +ron$ P6) ta$$in$ o )tanord ta$$er 4ver#ion .35 throu$h rue ba#ed P6) ta$$in$ 4)eeComputational Linguistics, voume /, number ?, ""GG-GG>.5. Con#ider the oo+in$ exam"e# that #tate#
the interna P6) conver#ion rue that rectiie# the erroneou# ta$$in$ out"ut o )tanord ta$$er,
i#it the )hee#h-Maha or the Ha o ictor% $itterin$ +ith mirror# and a#cend the ort on ee"hantR# bac&QStanord Out$ut: !isit222233, the@@@@DT, )hee#h@@@@77P, Maha@@@@77P, or@@@@CC, the@@@@DT,
Ha@@@@77P, o@@@@I7, ictor%@@@@77P, $itterin$@@@@!, +ith@@@@I7, mirror#@@@@S,
and@@@@CC, a#cend@@@@, the@@@@DT, ort@@@@77P, on@@@@I7, ee"hant@@@@77,
xtd@xtdxtd@xdt@@@@77, bac&@@@@0
Interna Pos Cate-or# Strin-: !isit22!ERB the-)hee#h-Maha@@76*7 or@@C67' the-Ha@@76*7 o@@PEPictor%@@76*7 $itterin$@@PrPAT +ith@@PEP mirror#@@76*7 and@@C67' bac&@@AD
a#cend@@TKPEAPP6I7T the-ort@@76*7 on@@PEP ee"hant@@76*7 xtd@xtdxtd@xdt@@A)0
A"art rom P6) ta$$in$, emotion and #en#e ta$$in$ i# nece##ar% in Machine Tran#ation to ca"ture the
#emantic anoma% o the natura an$ua$e.
)(. Chun&in$ i# an im"ortant "art o #hao+ "ar#in$ eve. It minimie# the number o to&en# to be #ent to thecore "ar#er, thu# reducin$ the number o "o##ibe ad(unction# and eected the tran#ation time a# +e a# thetran#ation 9uait%. =e "erorm noun "hra#e chun&in$ and verb $rou" coation. Con#ider the oo+in$
exam"e at eve- chun&in$,The-Prince;77P o;I7 =ae#;77P Mu#eum0;77P ,;, the-'ahan$ir-art-!aer%0;77P ,;, the-variou#-churche#0;77) ,;, tem"e#;77)
and;CC #hrine#;77) incudin$;! the;DT one;CD0;77P o;I7 Ha(i-Ai0;77P out;I7 on;I7 an-i#and0;77 in&ed;7 b%;I7 a-
cau#e+a%0;77 ,;, are;P +orth;'' a-$im"#e0;77
And eve-? chun&in$:The-Prince-o-=ae#-Mu#eum0;77P ,;, the-'ahan$ir-Art-!aer%0;77P ,;, the-variou#- churche#0;77) ,;, tem"e#;77) and;CC
#hrine#;77) incudin$;! the-one0;77P o;I7 Ha(i-Ai0;77P out;I7 on;I7 an-i#and0;77 in&ed;7 b%;I7 a-cau#e+a%0;77 ,;,
are;P +orth;'' a-$im"#e0;77
B
mailto:zxtd@zxtdzxtd@zxdt@@ASmailto:zxtd@zxtdzxtd@zxdt@@ASmailto:zxtd@zxtdzxtd@zxdt@@AS
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
6/12
c(. =e u#e a TA! 4Tree Ad(oinin$ !rammar5 'o#hi et al! /F0 "ar#er, and or that +e have created anumber o tree# to re"re#ent #tructure o #ource and tar$et an$ua$e#. In thi# ormai#m each to&en i# ta$$ed
+ith a P6) ta$;cate$or%, on the ba#i# o +hich a #et o "o##ibe tree ta$# are a##i$ned to the to&en. Thi#
"roce## i# caed tree ta$$in$. A #entence a# a #trin$ o tree-ta$$ed to&en#, are then #ent to the "ar#er. =hen ato&en in a #entence i# ta$$ed +ith a number o tree# the "ar#er i# iabe to "roduce muti"e derivation#, mo#t
o them bein$ ina""ro"riate. Thi# reduce# accurac% and #"eed. To eiminate thi# #"uriou# derivation, or at
ea#t minimie them, +e ado"ted the techni9ue o TA! tree "runin$. 6ur "runin$ modue urther di#ambi$uate# accordin$ to the #%ntactic context and he"# in #eection o TA! tree in more "reci#e +a%.
Accurac% and #"eed o the #%#tem, thu#, +a# #ub#tantia% im"roved.
d(. To hande the #%nthe#i# o con#truction# in Indian Lan$ua$e#, mor"hoo$ica com"exitie# o noun# andverb# and their inter-reation#hi"#, and the &aara&a ormai#m !an$o"adh%a%, M. //>0 in a deined context
"a%# a ma(or roe in noun or verb #%nthe#i#. Henceorth variou# cate$orie# at ad(oinin$ "o#ition# a# an
ad"o#itiona +ord# i&e ad(ective#, "o#t-"o#ition# 4"ara#ar$a#5, and variou# "artice# i&e the av%a%a# etc. are
a#o +ithin thi# deined context. a#ica% "o#t-"o#ition#, av%a%a# have modiied-modiier Arono, M.
/F30 unction addin$ more in$ui#tic inormation to the end-u#er# o the tar$et an$ua$e. The eatureembedded mor"hoo$ica rue# 4and a#o #ometime# $ender a$reement5 +ritten or the #%nthe#ier can be
#een throu$h the #%nthe#ied out"ut. erb# in the an$ua$e demand the &ara&a identitie# and the noun# ui
the demand# accordin$ to the %o$%ataa. And, in a deined context, noun# demand "ara#ar$a# or "o#t-"o#ition
on a #emantic account. oo+in$ dia$ram ex"ain# the #%nthe#i# "roce## in EILMT #%#tem:
,ia-ra 4: Mor"h-#%nthe#i# "roce## o EILMT #%#tem
Above mentioned "oint# rom a5 to d5, the in$ui#tic variation# and com"exitie# that are handed throu$h
"re-"roce##in$ or "o#t-"roce##in$ $enerative modue# have e#caated tran#ation accurac% and #"eed in acon#iderabe +a%. oo+in$ $ra"h re"re#ent# the com"ari#on o tran#ation #"eed bet+een od and ne+
ver#ion o EILMT #%#tem 4i.e., #"eed o tran#ation beore and ater "runin$ and context di#ambi$uation o
the P6) ta$#et#, TA! tree ta$$in$ and noun-verb #%nthe#i#5. The above deveo"ment at "ar#in$ and
$eneration #ta$e# ha# rai#ed the #"eed o tran#ation in the atter 4or ne+5 ver#ion o EILMT:
,ia-ra 5: Com"ari#on o #"eed o od and ne+ ver#ion
4.0 En-ish6Tai EILMT S#ste+ an Overview
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
7/12
In En$i#h-Tami EILMT #%#tem, #"ecia attention to Tami mor"hoo$ica #%#tem ha# been $iven. A# Tami
root# to Dravidian an$ua$e ami%, bein$ a$$utinative an$ua$e, the #%nthe#i# o inite and non-inite orm#,
#%nthe#i# o noun or noun $rou" and $ender ba#ed #%#tem ha# been catered throu$h eature ba#ed exicon, and
noun and verb mor"h-#%nthe#ier. In modern Tami three t%"e# o +ord# noun, verb and itaicco or "artice#are ound. The noun indicate# animate and inanimate cate$orie# 4ti7ai, i# ca##iied into u%arti7ai and
a&i7ai5. There are three $ender# in Tami - ma#cuine and eminine and neuter +here ma#cuine and
eminine indicate# #in$uar number and neuter $ender indicate# "ura number. There are three "er#on# in
Tami 4ir#t, #econd and third "er#on5. Ca#e inexion i# "rominent +ith #uixe# in Tami. Tami bein$a$$utinative in nature )ee aradara(an, Mu. /220 i# ound to be dierent to "ar#e and $enerate than the
manner in +hich Indo-Ar%an an$ua$e# are $eneratin$ in EILMT #%#tem. A""roximate% G,>>> biin$ua
exicon, /? "hra#a exicon, /F TA! tree di#ambi$uation rue, ? #ource TA! tree#, ?F tar$et TA! tree#,
BF tran#er $rammar ma""in$ and >> mor"h-#%nthe#i# rue are deveo"ed or En$i#h-Tami ver#ion.Con#ider the oo+in$ exam"e rom touri#m domain +ith EILMT TA! out"ut in Tami:
En-ish: Mother EarthR i# &ind in return. ; Tai: அன பம தரல வகயக இரறoo+in$ dia$ram re"re#ent# the En$i#h-Tami *#er Interace 4+ith Tami out"ut5:
,ia-ra 7: En$i#h-Tami *#er Interace out"ut
4.1 Transation Accurac# o En-ish6Tai EILMT S#steTo evauate the tran#ation accurac% o En$i#h-Tami #%#tem the #core +a# evauated throu$h)ub(ective;Human Evauation. The "arameter# or te#tin$ the tran#ation accurac% o EILMT #%#tem or
#ub(ective;human evauation are: P6) ta$$in$, P-)%ntax, !-)%ntax, Mor"h-)%nthe#i#, Lexicon avaiabiit%
and "hra#e mar&in$. =e re"re#ent here the interna te#tin$ carried out b% con#ortia and eedbac& on thete#t-re"ort "rovided b% the !I)T !rou", C-DAC, Pune on EILMT a"ha ver#ion .. "ee Appendix I B or En$i#h-Tami out"ut0. The above Evauation o En$-Tami o Human Evauator i# #ho+n in the oo+in$
ar-chart 4Evauation "arameter a# $iven above5.
,ia-ra 8: ar-Chart o En$-Tami tran#ation out"ut Evauation
4." Sco$e o i$ro/eent or En-6Tai TA% EILMT s#steIt i# evident rom the above i$ure that oo+in$ in$ui#tic deveo"ment i# re9uired or the urther
deveo"ment o En$i#h-Tami #%#tem to increa#e the tran#ation eve accurac%. oo+in$ "oint# are to be
con#idered or uture im"rovement o the En$i#h-Tami #%#tem:
3
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
8/12
a(. e-ramin$ and enhancin$ the 7oun Coation modue on the ba#i# o Phra#e Ta$$in$ and a$$utinatin$
character o Tami.)(. The "roce## o ne+ Tree #et creation; exi#tin$ Tree #et modiication ; deetion o edundant Tree #et to
be im"roved and tar$et tree #et "runin$, and "uriication o tran#er $rammar to be enhanced.c(. Enhancement o eature ba#ed in$ui#tic rue-#et in the verb $enerator and noun $enerator modue and
iin$ua exicon correction and "uriication and eature attachment to be com"eted.
5.0 ConcusionA the#e above indin$#, re#earch and im"ementation# to EILMT #%#tem $ive a more "roductive and
evoutionar% $round in e-cor"u# in Indian #ubcontinent. And thi# $round +i deinite% rai#e #ome critica
9ue#tionin$ and re-ana%in$ on machine tran#ation, text minin$, data "runin$, inormation extraction and
retrieva, #"eech "athoo$% and technoo$% in IL to IL inormation exchan$e and acce##. Thu# the re#earchand #tud% on EILMT or Indian an$ua$e# #houd be $uided and ormaied a# oo+in$:a(. )tandardiation o Indian ta$#et and con#iderin$ the actor o mor"hoo$ica% rich an$ua$e amiie#
and orma ta$$in$, #en#e ta$$in$ and emotion ta$$in$ o the e-cor"ora avaiabe in Indian an$ua$e#.)(. Memor% ba#ed "ar#in$ mana$ement to or$anie the muti"e an$ua$e +ith muti"e domain.c(. urther eature-ba#ed deveo"ment o mor"hoo$% ba#ed modue# or mor"hoo$ica% rich Indian
an$ua$e#.d(. urther, memor% mana$ed MT +i increa#e the #%#tem eicienc% -?> more. The #co"e o thi#
ana%#i# and #%nthe#i# can be extended or rever#e tran#ation a#-+e.
F.> ReerenceArono, Mar&. /F3. =ord ormation in !enerative !rammar. Cambrid$e: MA: MIT Pre##hattachar%a, T. and P. Da#$u"ta //3. Classi#iers! $ord order and de#initeness in Bangla. In .). Lahmi
and A. Mu&her(ee, ed. %ord order in Indian languages. FG-/B. H%derabad: oo&in.
!an$o"adh%a%, Maa%a. 4//>5. The &oun Phrase in Bengali' Assignment o# (ole and the
)aara*a Theor+. Dehi: Motia anar#ida##.'o#hi, Arvind, onnie =eber and Ivan )a$. /2, Eement# o Di#cour#e *nder#tandin$. Cambrid$e
*niver#it% Pre##, 7e+ Kor&.
1ri#hnamurti, h., C.P. Ma#ica and A. 1. )inha 4ed#5. 4/235. "outh Asian Languages' "tructure!
Convergence and ,iglossia. Dehi: Motia enar#ida##.1roch, T. and A. 'o#hi 4/25. The Linguistic (elevance o# Tree Adoining .rammar . *niver#it% o
Penn#%vania. De"artment o Com"uter and Inormation.
Patten, Terr% /2. A pro/lem solving approach to generating text #rom s+stematic grammars. Proceedin$#o ?nd Conerence on Euro"ean cha"ter o A##ociation or Com"utationa Lin$ui#tic#. !eneva. )+iterand.)incair, '. //. Cor"u#, concordance, coocation. Tu#can =ord Centre, 6xord: 6xord *niver#it% Pre##
. ?>>B. Deveo"in$ Lin$ui#tic Cor"ora: A !uide to $ood "ractice. 6xord: 6xord *niver#it%
Pre##
)e&ir& 4/2?5. The "+ntax o# %ord . MIT Pre##.ardara(an, Mu. /22. A 0istor+ o# Tamil Literature. Tran#ated rom Tami b% E. )a. i#+anathan -F.
)ahit%a Academ%. 7e+ Dehi.
F
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
9/12
A33E9URE 1 A
&EILMT s#ste s$eciications(Ser/er Machine&s(
Ser/er Machine : 1 &Main EILMT Ser/er(
Hard+are:
HP DL rac& Mount )erver, 2 core xeon G.>!HS, G?!AM, B2!U3 )A) Di#&.
6"eratin$ )%#tem:=indo+# )erver ?>>2
)ot+are e9uired:
'ava "atorm: (d&..>> or (re..>>.
)erver ver#ion: (bo##-B.>.?4A""ication #erver5.
Databa#e ver#ion: M%)VL )erver 3.>, M%)VL Too# or .>
Ser/er Machine : " &Ana%en Ser/er(
Hard+are:
HP DL rac& Mount )erver, 2 core xeon G.>!HS, G?!AM, B2!U3 )A) Di#&.
6"eratin$ )%#tem:
Linux edora 2 4or Con#ortia En$ine 7ame: Ana!en5.
Cient Machine
Hard+are:
PC +ith B23MH or Hi$her 4Pentium "roce##or recommended5.
3 M AM minimum. 6"eratin$ )%#tem:
=indo+# /2 J Above
Linux edora 2
ro+#er #u""ort:
Internet Ex"orer IE3, IEF
Moia ireox G.>.-G,G..?-3
!oo$e Chrome ?.>.F?.?2, ?.>.F?.GF, G.>./.?B, G.>./.?FA""e )aari G.?.?, B.>.?, B.>.G4G./.5, B.>.B4G.?.>5
6"era /.3B
A33E9URE I B
&E/auation o EILMT s#ste Transation out$ut or En-ish6Tai an-ua-e $air(!ersion 4.0
En-ish Sentence Transated out$ut &E6T( Ana#sis o Transation
Si$e ; Co$ua
rindavan i# a "i$rima$e. ரநவன ஒர யதயக இரற P6) >>P #%ntax >>
!-#%ntax >>
Lexicon F
Phra#e-Mar&in$ >>
)%nthe#ier >>
Si$e ; Co$ua &Possessi/e or(
The !an$a !oden 'ubieeMu#eum ha# a ar$e coection o
"otter%, "aintin$#, car"et#, coin#,
and armor%.
கஙக கடன !"#$ ம%&ய'()*)ட+, , -யஙக. , க*/ஙக.
, 01யஙக. (23 *டக45ன ஒர +*6ய7க6" இரனற
P6) >>P #%ntax >>
!-#%ntax >>
Lexicon >>Phra#e Mar&in$ >>
)%nthe#ier >>
Si$e ; Co$ua &co6ordinate(
The Mehran$arh ort ha# #even$ate# and "rovide# +onderu
vie+# o the cit%.
(கன89 :*9;>P #%ntax >>
!-#%ntax >>
Lexicon F>
Phra#e Mar&in$ >>
)%nthe#ier >>
&In( Si$e ; Co$ua
The be#t time to vi#it 'ai"ur i# +@"ப க1 மகA&ற"* 0 P6) >>
2
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
10/12
En-ish Sentence Transated out$ut &E6T( Ana#sis o Transation
December to ebruar%December to F B bரஅ9யக இரற P #%ntax 2>
!-#%ntax 2>
Lexicon >>
Phra#e Mar&in$ 2>
)%nthe#ier >>
Si$e
!-#%ntax F>Lexicon 2>Phra#e Mar&in$ >>
)%nthe#ier />
Reati/e Cause &su)ordinate cause( : Hidden
The 'a-Maha i# a "icture#9ue "aace buit or ro%a duc&
#hootin$ "artie#
(8 அM'6ய வ வ;டய<க;&கIகக க;ட"*;ட ஒர க)கவ9அ)(யக இரற
P6) >>P #%ntax >>
!-#%ntax >>
Lexicon 3>
Phra#e Mar&in$ >>
)%nthe#ier >>
A$$ositiona &/er) $artici$ia ; initia(
Phetchaburi, a ver% od cit%, anim"ortant to+n, had been $iven
#evera name# #uch a#, Phri""hri,
Phri""hi or Phetcha"hi
+8;Sஅ b K6 , ஒர மக *?ய 0க , -ரJய( 0க , T6""T6 , T6""Tலஅ4 +8;Sஅ"Tல * *4 +*ய9க.அகயவ அ" *;Hரந
P6) >>P #%ntax >>
!-#%ntax >>
Lexicon 2>
Phra#e Mar&in$ >>
)%nthe#ier F>
A$$ositiona&co$eent ; initia(
a#amand La&e J Paace, anartiicia a&e, i# a #"endid #"ot
and +a# buit in / AD.
*7(ந U6 & அ)( , -ர +7ய2கயU6 , ஒர அ?க இட(க இரற (231159 V D இ க;ட"*;ட
P6) >>P #%ntax >>
!-#%ntax >>
Lexicon /
Phra#e Mar&in$ >>
)%nthe#ier >>
That Co$eent
Madurai i# the ode#t cit% inTami 7adu and +a# home to the
ancient Tami )an$am, the
iterar% concave that "roduced
the ir#t e"ic, )ia""athi&aram.
( மP 0;H மக"*?(ய 0க(கஇரந (23 J க"#ய ,W4"*;Xக *=9/ *)டய Tamil Sangam , இ4ய"*)டயY;ட' இர"#ட(க இரந
P6) >>P #%ntax F>
!-#%ntax F>
Lexicon F>
Phra#e Mar&in$ >>
)%nthe#ier >
&Co6ordinate( ; That Co$eent
The "icture#9ue 1an$ra vae%
ha# #evera #"ot# that oer maha#eer river car".
க)கவ9 கங *./'' (8Z9 V3'றYற அ$ற *4 இடஙக. இரனற
P6) >>
P #%ntax >>!-#%ntax >>
Lexicon 2>
Phra#e Mar&in$ >>
)%nthe#ier >>
Co6ordinates
/
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
11/12
En-ish Sentence Transated out$ut &E6T( Ana#sis o Transation
*dai"ur i# #ituated in the
#outhern "art o a(a#than and i#
#urrounded b% the Aravai ran$e.
K@"ப9 C5ன +2' *'த=இட'[அ(க"*P #%ntax >>
!-#%ntax >>
Lexicon 2>
Phra#e Mar&in$ >>
)%nthe#ier >>
,iscourse connector
*dai"ur i# &no+n or it# beautiua&e#, +e #tructured "aace#,
u#h $reen $arden# and tem"e#
but the ma(or attraction# o thi#
"ace are the La&e Paace and theCit% Paace.
K@"ப9 அQடய அ?க U6கIககஅ[ய"*ட"*3 +க)Hர'
P6) 2>
P #%ntax F>
!-#%ntax F>Lexicon F>
Phra#e Mar&in$ >>
)%nthe#ier F>
Co$ua அ?க (யஙக$ன ஒனறக இரற
P6) /
P #%ntax /
!-#%ntax F>
Lexicon FPhra#e Mar&in$ >>
)%nthe#ier 2>
>
8/18/2019 A Pan Indian perspective in MT ver 5.0.doc
12/12