Date post: | 20-May-2015 |
Category: |
Technology |
Upload: | taus-enabling-better-translation |
View: | 289 times |
Download: | 1 times |
TAUS MACHINE TRANSLATION SHOWCASE
Creating Competitive Advantage with Rapid Customization & Deployment of Moses 10:20 – 10:30 Thursday, 10 October 2013 Tony O’Dowd KantanMT
No Hardware. No So,ware. No Hassle MT.
Tony O’Dowd Founder & Chief Architect
Localiza6on World 2013
TAUS – MT Showcase
What we aim to cover today? � User Scenario #1
� Building Produc?on MT Systems � Structured Approach � Build – Measure – Learn Process
� User Scenario #2 � Retraining with Post-‐Edits
� RoundTable Inc. – their story
� User Scenario #3 � Selec?ng the best engine for the job
� Milengo – their approach � GeLng the Translator involved
� Q&A
20 Minutes
TAUS – MT Showcase
What is KantanMT.com? � Sta6s6cal MT System
� Cloud-‐based � Highly scalable � Inexpensive to operate � Quick to deploy
� Our Vision � To put Machine Transla?on
� Customiza?on � Improvement � Deployment
� into your hands
Ac6ve KantanMT Engines
6,632 Training Words Uploaded
23,653,605,925 Member Words Translated
362,291,925
Fully Opera?onal 7 months
TAUS – MT Showcase
Measure – KantanMT engine calibra?on
� Track using KantanWatch™ � Compare engines quickly � Monitor produc?on data � Use your own test/tune data sets
TAUS – MT Showcase
Learn – KantanMT Experimenta?on
TAUS – MT Showcase
Learn – KantanMT Experimenta?on
� What to look out for? BLEU F-‐Measure TER Wordcount
24% 50% 66% 172K
TAUS – MT Showcase
Learn – KantanMT Experimenta?on
� Learn from examining the output
§ Catalogue Errors § Untranslated text § Incorrect numeric
formaLng § Invalid characters § High level of post-‐edi?ng
required
§ Conclusions
§ Engine coverage is bad due to low wordcount
§ Post-‐Edi?ng is high due to low engine coverage
§ Training data doesn’t contain correct numeric formaLng
§ Bad formaLng in training data
Low OK High Low
TAUS – MT Showcase
Learn – KantanMT Experimenta?on
§ Ac6on Plan § Coverage – More training
data required, relevant and of high quality. Also use a Glossary File to improve terminology consistency and accuracy.
§ Numeric Forma_ng – Use PEX rule to post-‐edit transla?on and fix numeric formats
§ Invalid Character – Use PEX rule to fix this invalid character issue
§ Post-‐Edi6ng – By increasing the quan?ty of training data the KantanMT engine will perform be]er overall
Low OK High Low
� Learn from examining the output
TAUS – MT Showcase
Ac6on Plan – focus on improving measurements
TAUS – MT Showcase
Build Measure Learn : The Results � Analyse output
§ Untranslated text § Numeric FormaLng § Invalid Character
TAUS – MT Showcase
User Scenario #2 � Long history of MT usage
� In-‐house exper?se � Large customer demand � Using MT since 2005 � Now manage their own in-‐house system on the KantanMT.com
� Goal � Faster project turnaround ?mes � More service offerings to client base � More produc?on capacity � Cost efficiencies
About RoundTable Studio RoundTable Studio is a leading provider of transla?on and localiza?on services for the Spanish and Brazilian Portuguese language markets.
Early Adopter
TAUS – MT Showcase
User Scenario #2 � Business Scenario
� Con?nuous transla?on quality improvement � Reduced post-‐edi?ng/turn-‐around ?mes
Early Adopter
TAUS – MT Showcase
User Scenario #2 � Results
� Greater produc?on capacity � Improvement in quality � Faster project turn-‐around ?mes
Early Adopter
“Since signing up with KantanMT, we have been able to take on more work and increase our capacity levels”
Laura Grossi – MT Specialist, RoundTable Studio
TAUS – MT Showcase
User Scenario #3 � Long history of MT usage
� In-‐house exper?se � Large customer demand
� Originally outsourced MT � 3rd party consultancy company
� Vendor Agnos6c � Microso, Translator Hub � KantanMT.com
� All systems are cloud based � Like hands-‐on approach to managing their own MT engines
About Milengo Milengo provides transla?on, localiza?on and related language services specializing in so,ware, website and documenta?on localiza?on.
TAUS – MT Showcase
User Scenario #3 � Business Scenario
� Select best engine for language combina?on
� Client requests a job that involves a MT component � Finding Training Data
� Data is aggregated from the clients previous transla?ons
� Building Engines � Same training data is provided to each engine � Same language combina?ons � Itera?ve process un?l sa?sfied with system performance (internal process)
TAUS – MT Showcase
User Scenario #3 � Transla6on Quality Analysis
� Sample of 1,000 segments selected � Tabulated & anonymised
� Dispatched to Senior Translators
Source MT Target
Adequ
acy (Score 1-‐5)
Fluency (Score 1-‐5)
Overall q
uality (1-‐4)
Wrong term
inology
Wrong Spellin
g
Source not
Transla
ted/Omissions
Compliance with
client sp
ecs
Literal transla
tion
Text/Information added
Capitalization
Wrong W
ord Form
Wrong Part o
f Speech
Punctuation
Sentence Structure
Tags and
Markup
Locale Adaptation
Spacing
Style Syntax and Grammar Tech
TAUS – MT Showcase
User Scenario #3 � Feedback collated from Senior Translators
� Match best engine for language quality � Very unique – pseudo-‐crowd sourcing of most appropriate engine
� Match engine to best language support � Translators always involved in engine selec?on process
� Feedback to client � Match requirements and quality expecta?ons
TAUS – MT Showcase
User Scenario #3 � Levels of post-‐edi6ng services
� Adequacy Review � All meaning expressed in the source segment appears in
the translated segment � Structural integrity – tags, placeholders � Fit-‐for-‐purpose quality
� Fluency Review � No grammar errors, excellent word selec?on and good
syntax � Publishable quality
� Client picks review � To fit budget, ?me-‐frame, audience, channel etc.
Tony O’Dowd [email protected]