+ All Categories
Home > Technology > TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Date post: 06-May-2015
Upload: taus-enabling-better-translation
View: 426 times
Download: 3 times
Share this document with a friend
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. For the latest updates, follow us on Twitter - #MosesCore
TAUS MACHINE TRANSLATION SHOWCASE MT for Southeast Asian Languages 14:00 – 14:20 Wednesday, 10 April 2013 Ai Ti Aw Institute for Infocomm, Singapore
Page 1: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013


MT for Southeast Asian Languages 14:00 – 14:20 Wednesday, 10 April 2013 Ai Ti Aw Institute for Infocomm, Singapore

Page 2: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Southeast Asian Language Machine Translation

Ms Ai Ti AW

Human Language Technology Department

Institute for Infocomm Research, Singapore

Page 3: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 3 Localization World, Singapore, 10-12 Apr 2013


1.  Machine Translation

2.  Southeast Asian Languages

3.  Institute for Infocomm Research (I2R)

4.  Challenges for Southeast Asian Language Translation

5.  Machine Translation Applications

Page 4: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 4 Localization World, Singapore, 10-12 Apr 2013

Pieter Brueghel the Elder (1563) (Wiki)

The Tower of Babel

Page 5: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 5 Localization World, Singapore, 10-12 Apr 2013

Languages of the World

Each  dot  represents  the  geographic  center  of  the  6,912  living  languages  in  the  Ethnologue  database.    Gordon,  Raymond  G.,  Jr.  (ed.),  2005.  Ethnologue:  Languages  of  the  World,  FiAeenth  ediBon.  Dallas,  Tex.:  SIL  InternaBonal.  Online  version:  hJp://www.ethnologue.com/.  

Page 6: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 6 Localization World, Singapore, 10-12 Apr 2013

Father of Translation

Xuanzang (玄奘,602‐664): First Translator in China


St. Jerome (347-420) Translation of Bible into Latin


Page 7: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 7 Localization World, Singapore, 10-12 Apr 2013

Pioneer of Machine Translation

Warren Weaver (1894-1978): Decoding

When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.” (1949)


Page 8: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 8 Localization World, Singapore, 10-12 Apr 2013

Translation Jokes

Page 9: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 9 Localization World, Singapore, 10-12 Apr 2013

Machine Translation

Expert knowledge Translation examples Translation Model

Language Model

• Word • Phrase • Tree

Translation Unit

• Lexical • POS • Syntax

Linguistic Complexity

Decoding Algorithm

Page 10: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 10 Localization World, Singapore, 10-12 Apr 2013

The Vauquois Triangle


syntactic transfer

semantic transfer


Page 11: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 11 Localization World, Singapore, 10-12 Apr 2013

Translation Methodology

Word-to-Word Translation Phrase-based Translation S









Give topenthe me .








Syntax-based Translation

Page 12: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 12 Localization World, Singapore, 10-12 Apr 2013

Rule-based Approach

Cerita menarik .

lexical structural



al structural Parsing


Structure Generation

Lingware Interpreter analysis generation

Morphological Rules

Language Model


Morph Generation

The story is interesting .

Transfer Structural Mapping Rules

Bilingual Dictionary

Page 13: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 13 Localization World, Singapore, 10-12 Apr 2013

Statistical-based Approach



Word alignment

Translation model (TM)

Re-ordering model (RM)

Language model (LM)

Parallel corpus Statistical

modeling Language modeling

Target language corpus

Source language Input f Statistical


Target language output e

Page 14: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 14 Localization World, Singapore, 10-12 Apr 2013

Southeast Asian Languages











Page 15: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 15 Localization World, Singapore, 10-12 Apr 2013

Characteristics of Southeast Asian Languages

Tone Affix Inflection Re-duplication

Word Segmentatio


Sentence Concept

Chinese Yes No No No Yes Yes

Filipino No Yes Yes Yes No Yes


No Yes No Yes No Yes

Khmer No No No Yes Yes Yes

Lao Yes No No No Yes Yes

Malay No Yes No Yes No Yes

Myanmar Yes No Yes No Yes Yes

Thai Yes No No No Yes No


Yes No No No Yes Yes - Contributed by the ASEAN-MT Project

Page 16: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 16 Localization World, Singapore, 10-12 Apr 2013

Language Processing Tools

Morphological Analysis

Word Segmentation

Sentence Boundary Detection

Chinese (Singapore) NA Available NA Filipino (Philippine) Available NA NA Indonesian (Indonesia)

Available NA NA

Khmer (Cambodian) NA Available NA Lao (Laos) NA Available NA Malaysian (Malaysia) Available NA NA Myanmar (Myanmar) Available Available NA Thai (Thailand) NA Available Available Vietnamese (Vietnam)

NA Available NA - Contributed by the ASEAN-MT Project

Page 17: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Research Institutes and Companies

Page 18: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 18 Localization World, Singapore, 10-12 Apr 2013

Page 19: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 19 Localization World, Singapore, 10-12 Apr 2013

Page 20: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 20 Localization World, Singapore, 10-12 Apr 2013

Machine Translation Research 1989: Initiated R&D in English→Chinese MT

1990: Awarded S$2m IBM English→Chinese MT project

1992: Developed in-house English↔Malay MT

1993: Set up MT Service Unit

1997: Spin-off AsiaRain Automated Translation

2000: Commercialized MT technology Chinese → English MT Indonesian ↔ English MT

English → Thai MT

2004: Enhance and construct lexical resources, machine learning techniques in source text analysis

2005: Started Statistical Machine Translation

2007: Vietnamese → English MT

2010: Hybrid MT

2012: Malay→Chinese MT, Vietnamese → Chinese MT

Page 21: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 21 Localization World, Singapore, 10-12 Apr 2013

Phrase-based SMT: Learning Heuristics

Deyi Xiong, Min zhang and Haizhou Li. Learning Translation Boundaries for Phrase-Based Decoding. NAACL-HLT 2010

Xiangyu Duan, Min zhang and Haizhou Li. Pseudo-word for Phrase-based Machine Translation. ACL-2010 Boxing Chen, Min Zhang and Aiti Aw. Two-Stage Hypotheses Generation for Spoken Language Translation. ACM TALP 8(1) (2009)

1)  Source  Phrase  Segmentation  2)  Phrase  Translation  3)  Target  Phrase  Reordering  

•  Discover  effective  heuristics  from  a  limited  dataset    •  Phrase  Segmentation  Model  

v 中国的/经济/发展 中国的/经济发展 中国的经济/发展 …..    

•  From  Word  to  Pseudo-­‐Word  and    then  to  Phrase  v  “想”  and  “would  like  to”    “多少 钱”  and  “how  much  is  it”    

•  Hypothesis  Regeneration  with  System  Combination  v  Generating  new  hypothesis  from  translation  results  (one  or  more  systems)  v  Combining  results  and  re-­‐scoring  

Page 22: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 22 Localization World, Singapore, 10-12 Apr 2013

Linguistic Syntax-based SMT

22 22

Bleu-4 on NIST 05 (Trained on FBIS Corpus)








SCFG Moses Ours: STSG Ours: STSSG

Tree  Sequence-­‐based  SMT  

Min Zhang, Hongfei Jiang, Aiti Aw and Haizhou Li. A Tree Sequence Alignment-based Tree-to-Tree Translation Model. ACL-2008:HLT

Forest-­‐based  SMT  

Hui Zhang, Min Zhang, Haizhou Li, Aiti Aw and Chew Lim Tan. Forest-based Tree Sequence to String Translation Model. ACL-IJCNLP-2009 Hui Zhang, Min Zhang, Haizhou Li and Chew Lim Tan. Fast Translation Rule Matching for Syntax-based Statistical Machine Translation. EMNLP-2009

Bleu-4 on NIST 05 (Trained on FBIS Corpus)

Moses Ours:TT2S




Page 23: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 23 Localization World, Singapore, 10-12 Apr 2013

Exploring Semantic in Phrase-based SMT Predicate  Translation  &  Argument  Reordering  

Deyi Xiong, Min Zhang, Haizhou Li. Modeling the Translation of Predicate-Argument Structure for SMT. ACL 2012. ���

Page 24: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 24 Localization World, Singapore, 10-12 Apr 2013

Discourse-based SMT (Topic Model)

Xinyan XIAO, Deyi XIONG, Min ZHANG, Qun LIU and Shouxun LIN. A Topic Similarity Model for Hierarchical Phrase-based Translation. ACL-2012

Page 25: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 25 Localization World, Singapore, 10-12 Apr 2013

Discourse-based SMT (Document Cache Model)

§  Use  document-­‐level  informaIon  to  choose  translaIon  candidates  

Zhengxian GONG and Min ZHANG. Cache-based Document-level Statistical Machine Translation. EMNLP-2011

Page 26: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 26 Localization World, Singapore, 10-12 Apr 2013

Challenge: Overcome Low Resources 1.  How to build system with limited language resources? 2.  How to leverage on human translation knowledge for SMT? 3.  How to improve the system when large language resources

are available?

Page 27: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 27 Localization World, Singapore, 10-12 Apr 2013


1.  Given limited statistics, consider using prior linguistic knowledge to improve the statistical model

2.  When we are able to craft rules, consider using statistical approach to improve the productivity

Page 28: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 28 Localization World, Singapore, 10-12 Apr 2013

Ø Term  §  Phrase  whose  structure  as  a  whole  carries  a  specific  meaning

Ø Term  IdenIficaIon  and  TranslaIon  §  Domain  Specific                §  Tedious  and  Bme  consuming  to  acquire  them  manually  for  a  new  


Lexical Pattern: Term Translation

• Skills  Upgrading  and  Resilience  Programme  

• SPUR  

• Program  Kemahiran  bagi  Peningkatan  dan  

Ketahanan    • SPUR  

• 技能提升与应变计划  • 策马扬鞭  

Page 29: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 29 Localization World, Singapore, 10-12 Apr 2013

Mining Bilingual Terms

Lianhau  Lee,  Ai+  Aw,  Thuy  Vu,  Sharifah  Aljunied  Mahani,  Min  Zhang  and  Haizhou  Li  “MARS:  Mul+lingual  Access  and  Retrieval  System  with  Enhanced  Query  Transla+on  and  Document  Retrieval”  ACL-­‐IJCNLP  2009.    Lianhau  Lee,  Ai+  Aw,  Min  Zhang  and  Haizhou  Li  “EM-­‐based  Hybrid  Model  for  Bilingual  Terminology  Extrac+on  from  Comparable  Corpora”,  COLING  2010  

Mono  Corpus  

Mono  Corpus  

Monolingual  Term  


Monolingual  Term  


Document  Alignment  

Bilingual  Term  Alignment  &  ExtracBon  


Mono  Terms  



Ø   Ways  of  acquiring  bilingual  terms  §  Alignment  on  parallel  

sentences  §  Using  web  data  to  search  for  

translaBon  candidates  §  Mining  from  comparable  

corpora  §  Manual  coding/analysis  of  

new  MWEs    

Ø   Our  approach  §  AutomaBc  mining  of  bilingual  

terms  from  comparable  corpora  §  Unavailability  of  large  

parallel  text  §  Easy  accessibility  of  

monolingual  corpus  

Page 30: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 30 Localization World, Singapore, 10-12 Apr 2013

Parallel Sentence Extraction: Document Alignment

Thuy  Vu,  Ai  Ti  Aw,  Min  Zhang.  2009.  Feature-­‐based  Method  for  Document  Alignment  in  Comparable  News  Corpora.  In  12th  EACL  2009,  Athens,  Greece  






1 11 21 31 41 51 61 71 81 91

Bank Dunia World Bank 世界银行





1 11 21 31 41 51 61 71 81 91

Dunia World 世界

Page 31: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 31 Localization World, Singapore, 10-12 Apr 2013

Document Alignment : Example Hospital Changi Baru dibuka mulai bulan depan

§  Author: Nazry Mokhtar, 28/11/1996

§  [Kemudahan $312 juta dijangka jadi …] Selain kemudahan perubatan penuh, ia akan mempunyai wad bersalin dan klinik bagi merawat bayi - sama seperti Hospital Kandang Kerbau. Sebuah hospital masyarakat baru juga akan dibina berdekatan hospital tersebut untuk menjadikan NCH sebagai pusat perubatan terunggul di kawasan timur Singapura yang mampu memenuhi keperluan sekitar 750,000 penduduk di situ. Ini menjadikannya sebagai hospital daerah pertama di sini yang dibangunkan khusus bagi memenuhi pelbagai keperluan perubatan penduduk di sesuatu daerah. §  [Menteri Kesihatan, Brigedier-Jeneral (Kerahan) George Yeo, berkata demikian …] Antara kemudahannya termasuk kemudahan bersalin yang dikelolakan oleh Hospital Kandang Kerbau dan kemudahan bagi rawatan psikiatri dan pemulihan. Hospital baru itu menggantikan Hospital Toa Payoh dan Hospital Changi. §  BG Yeo, yang juga Menteri Penerangan dan Kesenian, berkata: "Rancangan hospital ini ialah menawarkan kemudahan perubatan lengkap sejajar dengan matlamat menjadikannya sebuah pusat perubatan terunggul di daerah timur Singapura.” Mengenai hospital masyarakat yang bakal dibina berdekatan hospital baru itu, beliau berkata ia akan melengkapi kemudahan NCH. Hospital masyarakat dengan 200 katil pesakit itu akan diuruskan oleh Hospital St Andrew's Mission dan dijangka siap menjelang tahun 2000. [BG Yeo selanjutnya berkata …] §  Dalam lawatan semalam, BG Yeo yang ditemani Menteri Negara Kanan (Pendidikan dan Kesihatan), Dr Aline Wong, masing-masing menanam sebatang pokok di luar lobi hospital itu.

New Changi Hospital will be health-care hub for eastern S'pore

§  Author: Allison Lim, 28/11/1996.

§  THE New Changi Hospital will be Singapore's first purpose-built regional hospital, said Health Minister George Yeo. It will cater for up to 750,000 people who live in the east and northeast regions. [To reach out to them, it has been designed to be a meeting place …] §  Brigadier-General (NS) Yeo, who is also Minister for Information and the Arts, said that the hospital will have a birthing centre for young couples living in the region. It will be run as a satellite of the Kandang Kerbau Women's and Children's Hospital. In addition, there will be satellite facilities for psychiatry, rehabilitation medicine and other medical specialities. "The whole idea is a whole range of medical facilities in a hospital that will also serve as a health-care hub for the entire region," he said of the $480-million hospital. [The regional hospital concept ….] §  The minister, who was accompanied by senior officials from the Health Ministry, later planted a Chengai sapling, near the hospital entrance. Senior Minister of State (Health and Education) Aline Wong planted a Tampines sapling. [Health care will remain affordable …] §  BG Yeo said that later on, a community hospital will be built next to the New Changi Hospital, between it and the Pan-Island Expressway. "In fact, plans are already being drawn up and the St Andrew's Mission Hospital will run this new community hospital which will have more than 200 beds. So in this way we will provide, close to the housing estates here, a full range of medical facilities," he said. It should be ready by 2000. [He said that the regional hospital ….] §  The new regional hospital will replace Toa Payoh Hospital, which will become a community hospital, and the existing Changi Hospital. [The latter's site will be returned ….]

Page 32: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 32 Localization World, Singapore, 10-12 Apr 2013

Document Alignment : Example

MAS profit falls 68% to $1.22b on higher rates, stronger S$

§  Author: Ericia Tay, 21/07/2006.

§  [Central bank's total assets up …] The futures market suggests that oil prices could stay at around US$80 a barrel, and while the world economy has so far been resilient, the risks of a sharper slowdown due to supply disruptions have gone up, noted Mr Heng. §  Nevertheless, inflationary pressures at home "should be fairly well contained", even though the indirect effects of higher oil prices on energy-related consumer items and business costs are expected to strengthen. The MAS stuck to its earlier prediction that Singapore's economic growth this year is likely to be between 5 per cent and 7 per cent, barring unexpected shocks in the rest of the year. §  "Although global IT demand growth may be capped somewhat by potentially slower growth in the United States in the second half of 2006, the prospects for continued economic growth in the quarters ahead appear intact," said Mr Heng of the outlook for Singapore. The MAS also kept its inflation forecast of between 1 per cent and 2 per cent for the whole of this year. These macroeconomic projections are based on the assumption that crude oil prices average US$68 to US$75 a barrel. §  In the first half of this year, Singapore's gross domestic product (GDP) grew by an estimated 9 per cent from the same period last year. Taking into account Singapore's GDP growth and inflation prospects, the central bank said its policy stance on the Singdollar - a modest and gradual strengthening of the currency - remains appropriate. [Unlike many central banks which use interest rates as a policy tool…]


§  Author: 罗文燕, 21/07/2006

§  [ 在中东紧张局势升温。。。] 金融管理局董事经理王瑞杰昨天在发表常年报告书的记者会上说,高油价转嫁到能源相关消费物品和商业营运成本的程度预料会提高,但整体国内通货膨胀压力应该会受到相当好的控制。尽管油价升高,金管局保持对我国今年的通胀率将介于1%到2%的预测。[ 王瑞杰说 。。。] §  根据贸工部上星期发表的预估数据,我国经济今年上半年强劲增长了9.1%。不过,下半年的。王瑞杰说:“美国经济增长可能在下半年放缓,这或许会抑制全球资讯科技需求的增长,但(新加坡)今后几个季度持续保持经济增长的前景似乎没变。” §  因此,排除地缘政治风险激增等无法预见的外来冲击,金管局预期全年的经济增长率多数会保持在5%到7%。然而,王瑞杰指出:“石油供应被中断以致经济更急速放缓的风险现在增加了。显然的,地缘政治跟油价。。。” §  中东紧张局势最近升温,已导致油价进一步升高。王瑞杰说,从期货市场的走势来看,油价预料会保持在每桶80美元左右的高水平。他说,金管局对通胀和经济的预测,有考虑到平均油价可能处于每桶65美元到78美元的价位。 §  在考虑到我国的增长和通胀前景后,王瑞杰表示,金管局认为当局目前让新元汇率继续适度及逐步增值的政策立场仍然适合。当局下一次将在10月发表半年一次的货币政策声明。

Page 33: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 33 Localization World, Singapore, 10-12 Apr 2013

Hybrid System Source Beliau juga berterima kasih kepada MAS dan AirAsia kerana menyediakan

penerbangan terus ke Macau, yang memudahkan MGTO untuk mempromosikan bandar itu.

SMT He was also grateful to mas and airasia for providing direct flights to macau, which facilitate promoting the MGTO to.

MEMT He also is thankful for MAS and Airasia for preparing flight directly to Macau, which facilitates MGTO to promote the town.


He was also grateful to MAS and Airasia for providing direct flights to macau, which facilitates the MGTO to promote the city.


SMT 0.4062

MEMT 0.2725

SMT+MEMT 0.4165

Page 34: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 34 Localization World, Singapore, 10-12 Apr 2013

Scientific Achievements

§  Papers in leading journals •  IEEE Transactions on Audio, Speech and Language

Processing •  ACM Transaction on Asian Language Information

Processing •  Information Processing and Management •  Computational Linguistics

§  Papers in leading conferences •  The Annual Meeting of The Association for

Computational Linguistics (ACL) •  Conference on Empirical Methods in Natural Language

Processing (EMNLP) •  International Conference on Computational Linguistics


Page 35: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 35

Baidu's Box Computing: Beating Google At Its Own Game March 27, 2012, Seeking Alpha “… According to Baidu, 60% of search results are produced by Box Computing, which delivers interactive, relevant, and intuitive search experience that makes Baidu a clear leader in China's online search market. Unfortunately, Google has yet to catch up with Baidu on semantic search.” “…Recently, Baidu formed a partnership with Agency for Science, Technology and Research (A*STAR) to establish an R&D center in Singapore that focuses on developing South Asian language processing technology. The joint research lab will initially focus on Vietnamese and Thai.”

Baidu-I2R Research Centre

Page 36: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Network-based Speech to Speech Translation Service

Page 37: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 37 Localization World, Singapore, 10-12 Apr 2013


- No existing commercial Malay speech recognition.

- Small footprint – compact models, can run on small devices.

Malay-English S2S Mobile Translation

Usable in many contexts - Humanitarian and Disaster Relief Efforts - Tourist travel

Page 38: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 38 Localization World, Singapore, 10-12 Apr 2013

Document Translation

Page 39: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 39 Localization World, Singapore, 10-12 Apr 2013

Multilingual Chat & Messaging

Chat Server

Chat Client Chat Client

Translation Bot Normalization Bot


3 21

1. Chat message normalized by normalization bot.2. Chat message sent to chat server.3. Chat message sent to the recipient.4. Chat message translated by the translation bot.

Default Dictionary

User defined


Web Service Server

User defined


defined dictionary

User defined


Page 40: TAUS MT SHOWCASE, MT for Southeast Asian Languages, Ai Ti Aw, Institute for Infocomm, 10 April 2013

Localization World, Singapore, 10-12 Apr 2013 40
