Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | kantanmt |
View: | 99 times |
Download: | 0 times |
No Hardware. No Software. No Hassle MT.
New Breakthroughs in Machine Translation Technology
in association with#KantanWebinar
KantanMT.ComNO HARDWARE. NO SOFTWARE. NO HASSLE MT
Tony O’DowdFounder & Chief Architect
New Breakthroughs in Machine Translation Technology
What we aim to cover today?
What is KantanMT.com?
Challenges of the L10N Industry Making the right Project Management decisions
Going beyond the baseline of MT quality
Conclusions15 minutes
What is KantanMT.com?
Statistical MT System Cloud-based =
Highly scalable
Inexpensive to operate
Quick to deploy
Our Vision To put Machine Translation:
Customization
Improvement
Deployment
…into your hands
Active KantanMT Engines
6,191
Training Words Uploaded
28,243,234,615Member Words
Translated
427,526,741
Fully Operational 15 months
Initial Steps of any project are: Determine Scope
How long will it take?
How much will it cost?
What is my margin?
Determine resources How many Translators will I need?
Introducing KantanAnalytics™ …think Fuzzy-Match report and you’ve got it in one!
Challenge #1
How can Project Managers ‘manage’ Post-Editing Projects?
KantanAnalytics™
Kantan TotalRecall – Advanced TM
% of TM hits in this job
KantanMT – automated translations
% of automated translations for this job
Range of QE ScoresQE range defined to match existing fuzzy match ranges used by L10N industry
Quality Estimation ScoresSegment level QE scores – akin to fuzzy match scores
Word Counts – Project Stats
Can be used to develop Project TimeLine and Tiered Pricing Model for Post-Editing Projects
Placeholder & Tag CountsUsed by PM for complexity sur-charges
KantanAnalytics embeds QE scores into
TRADOS Studio
MemoQ
XLIFF
KantanAnalytics™
Helping PMs make the right business decisions!
KantanAnalytics™ - Helping PMs make the right decisions
Challenge #2: Going beyond the baseline and
developing production ready MT!
Easy to build 1st baseline engine Aggregate Training Data – TM, Mono, Stock,
Terminology
Use Cloud-based platform, like KantanMT.com
Real Challenge: How do these platforms go beyond the baseline
engine and achieve higher levels of production quality
Introducing Kantan BuildAnalytics Data analytics and visualisation providing insights
into the customisation of SMT engines.
Kantan BuildAnalytics™Rapidly develop production ready engines
Summary Report
Training Rejects Reports
F-Measure Analysis
BLEU Analysis
TER Analysis
GAP Analysis
Timeline Report
Deep Tuning
Kantan BuildAnalytics™
F-Measure ScoreMeasures word recall & precision of KantanMT engines
DistributionsProvides distribution of F-Measure scores across all reference translations
Kantan Insight™Holistic analysis of score and advice on how to improve this for KantanMT engines
Detailed Analysis Segment level F-Measure analysis to help SMT Developers improve training material
Kantan BuildAnalytics™
Detailed Reports for: F-Measure, BLEU and TER
Kantan BuildAnalytics™
Gap Analysis – quickest way of improving fluency
Kantan BuildAnalytics™
Training Rejects Report – Improve training data rapidly
Kantan BuildAnalytics™
Timeline – Tracks history of KantanMT engines
Kantan BuildAnalytics™ - Rapid MT Customisation
bmmt GmbH and KantanMT:
The Real-World Use of Machine Translation
Maxim KhalilovTechnical Lead
bmmt GmbH
KantanMT webinarApril 10, 2014
MT in industry: context and rationale
The combination of these two technologies, well-established TM and cutting-edge MT, plus post-editing allows the creation of a high-quality translation that reads just as well as a “classically” produced translation.
MT in industry: what about cost?
The cost structure changes when machine translation is integrated into the translation pipeline. When machine translation is adopted, the data preparation and quality assurance (editing) costs rise whereas translation costs fall to as low as zero. Most importantly, the total cost of translation is reduced dramatically as illustrated.
MT case study
Customer: big German machine manufacturer
Project: 51,000 words, technical documentation. English into German. Approach: hybrid MT/TM.
Settings: the files were processed through Trados Studio 2011.
Implementation: KantanMT
Description: Roughly 7,000 words came from TM as high matches. The remainder went through MT-based pretranslation, followed by a post-editing cycle, with the overall goal to produce the same level of quality as in an all-human translation.
Training material: Our customer had not worked in this language combination before, so there was no TM to go on. But we knew that the English authors based their work on material that the customer had previously translated from German into English. Thus we reversed the language direction of the TM and trained a customer-specific engine with this TM.
Results: As a result, 44,000 words were post-edited to a final quality level that the customer was very happy with.
Cost savings > 30%.
MT: benefits of KantanMT solution
Fully automated system training
One-click system customization
Automatic data pre-processing
Fully automated translation
Automatic pre- and post-processing
Quality assessment
KantanWatch
Gap Analysis
Reject Report
No worry about maintenance and infrastructure
MT: benefits of KantanMT solution
Transparent file format conversion
Training material conversion: TM conversion, monolingual material
Documents to translate: TMS format into MTable format
SDLXliff
Smooth terminology integration
Consistent terminology
Tag handling and mark-up transfer
Source: <x id="16480"/>SWord1 SWord2 SWord3 SWord4 <g id="16481">Number</g><g id="16480">SWord 8 SWord 9</g>
Target: <x id="16480"/>TWord1 TWord2 TWord3 TWord4 <g id="16480">TWord 8 TWord 9</g><g id="16481">Number</g>
bmmt GmbH
Founded in 2013 by a group of language industry experts who wanted to offer innovative translation technology solutions
Three operations centers in Germany: Munich, Berlin and Stuttgart
bmmt GmbH heavily relies on KantanMT services from 2013
Primary industries: Automotive and Trucks, Machine Engineering, Telecomunications, Construction, IT
Types of documents: workshop texts, product catalogues & other highly repetitive information documents
Primary source language: German
Integration: SDL Trados, SDL WorldServer and others
Find more: www.machine-translation.eu
BerlinAlt-Moabit 9210559 BerlinPhone: +49 30-3117505-15Fax: +49 30-3117505-20
MunichBernhard-Wicki-Straße 580636 MunichPhone: +49 89 2000037-17Fax: +49 89 2000037-11
StuttgartRuppmannstraße 33b70565 StuttgartPhone: +49 711 16646-66Fax: +49 711 16646-50
bmmt [email protected]
Thank you
No Hardware. No Software. No Hassle MT.
New Breakthroughs in Machine Translation Technology
in association with#KantanWebinar
Tony O’Dowd, [email protected] Khalilov, [email protected]
Speakers