Date post: | 06-Jan-2017 |
Category: |
Software |
Upload: | srisatish-ambati |
View: | 705 times |
Download: | 1 times |
An MMORPG Mobile Game Ø Game of War: Fire Age o A Massively Multiplayer Online Role Playing Game
Ø Millions of Downloads, Top grossing game on iOS and Android
2
Why Translation? Ø A truly global ‘One World’
Ø Only 75% of players speak English o 25%: Japanese, Russian, French, Spanish, German, Turkish,
Portuguese and others
Ø Seamless communication drives Player Engagement
Ø Tighter community - Long Term Relationships
3
“Hi guys a cyclone is due to hit us in about 4 hours. Im unsure how electricity will be affected. Im peaced for 2 days. If you see my accounts unpeaced keep an eye on them as it means im unable to access the game. wish
me luck"
Need for Nrmlizatn Ø Mobile game players use a lot of slang
o Hard to type on mobile devices o Players are busy playing the game
Ø Constantly evolving language
Ø Regional, Linguistic influences
o Maori Slang o Emoticons and Emoji
Ø Slang text effects Machine Translation accuracy
Ø Hence MZ Transformer – A Social Media Text Normalization system
5
The Data Problem Ø Hard to get a parallel corpus of Slang-
Grammatical text
Ø Hard to compile Abbreviation lists, Spelling error lists for non-English languages
Ø Expensive to create
Ø Domain varies from Microblogs
6
Avg. Tweet Length Avg. GoW chat length
73.51 characters 34.43 characters
Crowdsourcing + Game Economy Ø In-game currency buys Virtual goods
Ø Competitive game
Ø Crowdsource parallel corpus creation
Ø In-game items and/or currency as rewards
Ø Steady stream of normalization training data
Ø Across languages and over time
Ø Rate modification with game economy
7
Crowdsourced Training Corpus Ø 1-best hypothesis selected from data collected
Ø Incentivized feedback loop
10
Source Phrase Response Received Num. Players
yo wasup zack .. i just wakey
Yo, what’s up Zack? I just woke up. 1013
Hi, what’s up Zack? I just woke up. 327
Hey, what’s up Zack? I just woke up. 133
What’s up Zack? I just woke up. 61
To what’s up Zack? I just work up. 12
Yo what’s up Zack. I just awoke 3
MZ Transformer Ø Normalization as a pre-step before MT o Abbreviations o Spelling Errors o Phrase pairs o HMM based Text Normalization system
o Word alignment
o Phrase based Text normalization system built on a parallel corpus
o P(gramm_phrase/slang_phrase) o HMM decoder to generate target language
o Language model
11
Bleu score Improvements
12
Source Language Target Language w/o MZ Transformer w/ MZ Transformer
Spanish English 37.82 39.77
English Spanish 31.29 32.87
French English 46.30 47.73
English French 31.90 33.19
German English 41.02 43.98
English German 26.92 26.96
Portuguese English 50.94 52.13
English Portuguese 38.09 38.12
Russian English 38.64 40.17
English Russian 24.80 25.43
Conclusions Ø Normalization helps!
Ø Crowdsourced data collection is low cost, faster
Ø More Normalization layers and training data – higher improvement
Ø lol à mdr
Ø Using 10-best instead of 1-best caused overfitting
Ø Normalization across languages
13
Future Work Ø Crowdsourcing system can be used for: o Text Translation o Speech Transcription o mTurk style tasks
Ø Collect data in resource poor languages o Bulgarian, Malay, Slovak, Ukrainian
14