Date post: | 20-Mar-2017 |
Category: |
Technology |
Upload: | kantanmt |
View: | 40 times |
Download: | 0 times |
PowerPoint Presentation
Get Started with KantanNeuralTM
Tony ODowdDimitar Shterionov
1
AgendaIntroduction to KantanMTIntroduction to KantanNeuralSMTNMTWhats the difference?KantanLQRA-B Testing to measure translation qualityQ&A
Statistical MT PlatformCloud-basedHighly scalable for high volumeFully Automated Useable Translation (FAUT)Fusion of TM & MT & rulesMultiple Deployment OptionsMT Factory for Rapid DevelopmentOur VisionTo put Machine Translation Customization Improvement Deployment into your hands
What is KantanMT.com?
Active KantanMT Engines10,394
Training Words Uploaded534,179,753,587
Member Words Translated8,571,269,925www.kantanMT.com
3
Service BreakdownWords Translated (in Billions)Total Number of words translated per quarter FY 2016
Sectoral BreakdowneCommerceOnline Retailing, Product Catalogues TechnologySoftware, User Assistance Materials, Web sites
OthersSupport, Legal, Medical and other sectors
4
The KantanMT Community
Some clients un-named for maintenance of confidentiality
We are the fastest growing MT provider, even though we are one of the young-guns!5
Academic Partner Program
6
Industry Affiliations & Memberships
7
KantanMT.com A Complete MT PlatformKantan TemplatesKantan NERKantan LlibraryKantan FleetKantan BuildAnalyticsBuildKantan AnalyticsKantanPEXKantan LQRAdaptiveMTKantan GENTRYKantanTotalRecallKantanNeuralImproveKantan TranslateKantan SwiftKantan APIKantan AutoScaleKantan OfficeMTKantan ConnectorsKantan SnippetsDeploy
KantanMT.com A Complete MT PlatformKantan TemplatesKantan NERKantan LlibraryKantan FleetKantan BuildAnalyticsBuildKantan AnalyticsKantanPEXKantan LQRAdaptiveMTKantan GENTRYKantanTotalRecallImproveKantan TranslateKantan SwiftKantan APIKantan AutoScaleKantan OfficeMTKantan ConnectorsKantan SnippetsDeployKantanNeuralKantanNeural
CustomisableFastHigh quality
Learning to translate from data(Phrase-based) Statistical Machine Translation
Phrase table:I didhebe ichI didich hebe UnfortunatelyleiderUnfortunatelyunglcklichReceive an asnweremfange eine AntwortReceive an answerAntwort emfangeReceive an answerAntwort bekommenReceive an answerAntwort erhalten
Ididnotunfortunatelyreceiveananswer tothisquestion
AufdieseFragehebeichleiderkeine Antwortbekommen
Learning to translate from data(Phrase-based) Statistical Machine TranslationA large phrase-based dictionaryBased on smart mathematics and sophisticated toolsRequires human interventionHigh demand for better translationSMT has reached a point that cannot respond to the demandNeural MT (NMT)
Learning to translate from data NMTRecurrent Neural NetworksHave memory compressing input sequencesNo more phrases sequences of vector-represented wordsEncode-decoderA source sentence encoded as a vector cThe decoder predicts a word based on c and the already predicted words.
Learning to translate from data NMTParallel data to train NMT modelJointly learns to encode and decodeWords are represented as vectors (1-Hot encoding)
x1 x2 x3c
y1 y2 y3
Learning to translate from data NMTLarge vocabulary large vectors to deal withUnknown wordsSMT do not translateNMT Unexpected behaviour Word segmentationByte-pair encodinglower low ertallest tall estalmost al mostSingle character n-gram: Hundreds of thousands of words in English but 26 characters. How about Simplified Chinese? Japanese?
lowesttallerallow
KantanNeural - under the hood
Vector manipulations are very computationally expensiveGPU Machines with K520 GridTraining, translation, API architectureModular Easy to upgrade and migrateDistributed Highly efficientGPU and CPU Word segmentationSingle character n-gram for DBCByte-pair encoding for SBC
KantanNeural - under the hood
Considered toolstheano-based (nematus)torch-based (OpenNMT)OpenNMT 5x faster than nematusLarge development community
KantanNeural - current stateYesterdayFast, customizable, high-quality SMT enginesTodayKantanFleet charged with KantanNeuralTomorrowCustomizable KantanNeural enginesTM, Terminology, NERFast training on prebuilt models
Expanded fleet of GPU powered machines
KantanNeural - Deployment3 Language CombinationsEN-DE, EN-ZH, EN-JP
Now available for use on the KantanMT PlatformBeta I ReleasePart of the KantanFleet Collection of pre-built enginesKantanMT Account holders can now translateAll document formats are supportedNew Language Arcs will be added during Q1 2017Arcs# Segments # WordsDomainEN-DE8.8 million156 millionLegalEN-ZH3.5 million53 millionLegalEN-JA8.1 million90 millionLegal
KantanNeural - identification
KantanNeural - select from Fleet
KantanNeural - set the engine type
But how about Quality?
KantanNeural - QualityPhase 1 : Automated Test Score ComparisonsArcsTypeF-MeasureBLEUTEREN-DESMT68%59%50%NMT67%49%51%
ArcsTypeF-MeasureBLEUTEREN-ZHSMT76%43%45%NMT73%43%44%
ArcsTypeF-MeasureBLEUTEREN-JASMT78%53%45%NMT68%40%53%
KantanNeural - Quality
A reliable quality measurement not available at presentSMT higher BLEUHuman translators prefer NMT outputArcsTypeF-MeasureBLEUTEREN-DESMT68%59%50%NMT67%49%51%
ArcsTypeF-MeasureBLEUTEREN-JASMT78%53%45%NMT68%40%53%
ArcsTypeF-MeasureBLEUTEREN-ZHSMT76%43%45%NMT73%43%44%
NMT
SMT
Ref.Diese Bandbreite stellt somit eher eine berbewertung der Produktion dar.
Es handelt sich daher eher um eine zu hoch angesetzte Produktionsschtzung.
Es handelt sich daher eher um eine zu hoch angesetzte Produktionsschtzung.This range therefore represents an overestimation of production.
It is therefore a question of a too high estimation of production.
It is therefore a question of a too high estimation of production.
KantanNeural - Quality
A reliable quality measurement not available at presentSMT higher BLEUHuman translators prefer NMT outputArcsTypeF-MeasureBLEUTEREN-DESMT68%59%50%NMT67%49%51%
ArcsTypeF-MeasureBLEUTEREN-JASMT78%53%45%NMT68%40%53%
ArcsTypeF-MeasureBLEUTEREN-ZHSMT76%43%45%NMT73%43%44%
NMT
SMT
Ref.Diese Bandbreite stellt somit eher eine berbewertung der Produktion dar.
Es handelt sich daher eher um eine zu hoch angesetzte Produktionsschtzung.
Es handelt sich daher eher um eine zu hoch angesetzte Produktionsschtzung.This range therefore represents an overestimation of production.
It is therefore a question of a too high estimation of production.
It is therefore a question of a too high estimation of production.
Therefore this range represents rather an overestimate of the production.
Translation Quality Evaluation
KantanLQRQuick Introduction to KantanLQRBuilt into the KantanMT platformIntegral step in KantanMT Engine Development
KantanLQRProject Types now supported:-
Quality EvaluationFactored ModelKPIs based on Simplified FactorsMQMDQF Harmonised MQM-DQFReal-time dashboard
A/B TestComparison ModelKPIs based onA-BA-B-C andA-B-C-DReal-time dashboard
KantanLQR
KantanLQR
KantanLQR
KantanLQR
KantanLQR
KantanLQR
KantanLQR
KantanLQR
KantanLQR
Conclusions
ConclusionsKantanNeuralAvailable in three language arcs from KantanFleetEfficient training with OpenNMT and K520 Grid GPUsInherits the merits of KantanMT pipelinesA/B TestingA, B (C or D) testing now fully supportedReal-time data analytics built into your LQR Dashboard
Both KantanNeural and A/B Testing are available to all KantanMT Account holders
Future workKantanNeuralBuild-your-own KantanNeural engineIncremental/adaptive NMTAPI support for NMTFurther distribute the training and translation pipelines for even higher efficiencyIntelligent data preprocessing for even faster and more efficient training.A/B Testing
Solving
Thank you
41
Solving
Thank you
42
Learning to translate from data(Phrase-based) Statistical Machine TranslationLearn a phrase-table from word alignmentLearn a language model based on relative frequencies
leprogrammeatmisenapplicationAndTheprogrammeHasBeenimplemented
Learning to translate from data(Phrase-based) Statistical Machine TranslationLearn a phrase-table from word alignmentLearn a language model based on relative frequencies
leprogrammeatmisenapplicationAndTheprogrammeHasBeenimplemented
the program always together
Learning to translate from data(Phrase-based) Statistical Machine TranslationLearn a phrase-table from word alignmentLearn a language model based on relative frequencies
leprogrammeatmisenapplicationAndTheprogrammeHasBeenimplemented
the program always togetherhas been always together
Learning to translate from data
Translation Quality EvaluationKantanLQRBuilt into the KantanMT platformIntegral step in KantanMT Engine DevelopmentTranslation Quality EvaluationFactored ModelTemplates based on Simplified Factors, MQM, and DQF and MQM-DQFA/B TestingA, B (C or D) testing now fully supportedReal-time data analytics built into your LQR DashboardAvailable to all KantanMT Account holders
KantanLQR A/B TestingComparative analysis between two, three or four different Machine Translation (MT) outputs.
Designed to compare SMT with NMT outputs
KantanLQR A/B Testing - create
KantanLQR A/B Testing - create
KantanLQR A/B Testing add data
KantanLQR A/B Testing add data
KantanLQR A/B Testing add data
KantanLQR A/B Testing scoring
KantanLQR A/B Testing review
KantanLQR A/B Testing review