+ All Categories
Home > Technology > Get Started with KantanNeural

Get Started with KantanNeural

Date post: 20-Mar-2017
Category:
Upload: kantanmt
View: 40 times
Download: 0 times
Share this document with a friend
52
Get Started with KantanNeural TM Tony O’Dowd Dimitar Shterionov
Transcript

PowerPoint Presentation

Get Started with KantanNeuralTM

Tony ODowdDimitar Shterionov

1

AgendaIntroduction to KantanMTIntroduction to KantanNeuralSMTNMTWhats the difference?KantanLQRA-B Testing to measure translation qualityQ&A

Statistical MT PlatformCloud-basedHighly scalable for high volumeFully Automated Useable Translation (FAUT)Fusion of TM & MT & rulesMultiple Deployment OptionsMT Factory for Rapid DevelopmentOur VisionTo put Machine Translation Customization Improvement Deployment into your hands

What is KantanMT.com?

Active KantanMT Engines10,394

Training Words Uploaded534,179,753,587

Member Words Translated8,571,269,925www.kantanMT.com

3

Service BreakdownWords Translated (in Billions)Total Number of words translated per quarter FY 2016

Sectoral BreakdowneCommerceOnline Retailing, Product Catalogues TechnologySoftware, User Assistance Materials, Web sites

OthersSupport, Legal, Medical and other sectors

4

The KantanMT Community

Some clients un-named for maintenance of confidentiality

We are the fastest growing MT provider, even though we are one of the young-guns!5

Academic Partner Program

6

Industry Affiliations & Memberships

7

KantanMT.com A Complete MT PlatformKantan TemplatesKantan NERKantan LlibraryKantan FleetKantan BuildAnalyticsBuildKantan AnalyticsKantanPEXKantan LQRAdaptiveMTKantan GENTRYKantanTotalRecallKantanNeuralImproveKantan TranslateKantan SwiftKantan APIKantan AutoScaleKantan OfficeMTKantan ConnectorsKantan SnippetsDeploy

KantanMT.com A Complete MT PlatformKantan TemplatesKantan NERKantan LlibraryKantan FleetKantan BuildAnalyticsBuildKantan AnalyticsKantanPEXKantan LQRAdaptiveMTKantan GENTRYKantanTotalRecallImproveKantan TranslateKantan SwiftKantan APIKantan AutoScaleKantan OfficeMTKantan ConnectorsKantan SnippetsDeployKantanNeuralKantanNeural

CustomisableFastHigh quality

Learning to translate from data(Phrase-based) Statistical Machine Translation

Phrase table:I didhebe ichI didich hebe UnfortunatelyleiderUnfortunatelyunglcklichReceive an asnweremfange eine AntwortReceive an answerAntwort emfangeReceive an answerAntwort bekommenReceive an answerAntwort erhalten

Ididnotunfortunatelyreceiveananswer tothisquestion

AufdieseFragehebeichleiderkeine Antwortbekommen

Learning to translate from data(Phrase-based) Statistical Machine TranslationA large phrase-based dictionaryBased on smart mathematics and sophisticated toolsRequires human interventionHigh demand for better translationSMT has reached a point that cannot respond to the demandNeural MT (NMT)

Learning to translate from data NMTRecurrent Neural NetworksHave memory compressing input sequencesNo more phrases sequences of vector-represented wordsEncode-decoderA source sentence encoded as a vector cThe decoder predicts a word based on c and the already predicted words.

Learning to translate from data NMTParallel data to train NMT modelJointly learns to encode and decodeWords are represented as vectors (1-Hot encoding)

x1 x2 x3c

y1 y2 y3

Learning to translate from data NMTLarge vocabulary large vectors to deal withUnknown wordsSMT do not translateNMT Unexpected behaviour Word segmentationByte-pair encodinglower low ertallest tall estalmost al mostSingle character n-gram: Hundreds of thousands of words in English but 26 characters. How about Simplified Chinese? Japanese?

lowesttallerallow

KantanNeural - under the hood

Vector manipulations are very computationally expensiveGPU Machines with K520 GridTraining, translation, API architectureModular Easy to upgrade and migrateDistributed Highly efficientGPU and CPU Word segmentationSingle character n-gram for DBCByte-pair encoding for SBC

KantanNeural - under the hood

Considered toolstheano-based (nematus)torch-based (OpenNMT)OpenNMT 5x faster than nematusLarge development community

KantanNeural - current stateYesterdayFast, customizable, high-quality SMT enginesTodayKantanFleet charged with KantanNeuralTomorrowCustomizable KantanNeural enginesTM, Terminology, NERFast training on prebuilt models

Expanded fleet of GPU powered machines

KantanNeural - Deployment3 Language CombinationsEN-DE, EN-ZH, EN-JP

Now available for use on the KantanMT PlatformBeta I ReleasePart of the KantanFleet Collection of pre-built enginesKantanMT Account holders can now translateAll document formats are supportedNew Language Arcs will be added during Q1 2017Arcs# Segments # WordsDomainEN-DE8.8 million156 millionLegalEN-ZH3.5 million53 millionLegalEN-JA8.1 million90 millionLegal

KantanNeural - identification

KantanNeural - select from Fleet

KantanNeural - set the engine type

But how about Quality?

KantanNeural - QualityPhase 1 : Automated Test Score ComparisonsArcsTypeF-MeasureBLEUTEREN-DESMT68%59%50%NMT67%49%51%

ArcsTypeF-MeasureBLEUTEREN-ZHSMT76%43%45%NMT73%43%44%

ArcsTypeF-MeasureBLEUTEREN-JASMT78%53%45%NMT68%40%53%

KantanNeural - Quality

A reliable quality measurement not available at presentSMT higher BLEUHuman translators prefer NMT outputArcsTypeF-MeasureBLEUTEREN-DESMT68%59%50%NMT67%49%51%

ArcsTypeF-MeasureBLEUTEREN-JASMT78%53%45%NMT68%40%53%

ArcsTypeF-MeasureBLEUTEREN-ZHSMT76%43%45%NMT73%43%44%

NMT

SMT

Ref.Diese Bandbreite stellt somit eher eine berbewertung der Produktion dar.

Es handelt sich daher eher um eine zu hoch angesetzte Produktionsschtzung.

Es handelt sich daher eher um eine zu hoch angesetzte Produktionsschtzung.This range therefore represents an overestimation of production.

It is therefore a question of a too high estimation of production.

It is therefore a question of a too high estimation of production.

KantanNeural - Quality

A reliable quality measurement not available at presentSMT higher BLEUHuman translators prefer NMT outputArcsTypeF-MeasureBLEUTEREN-DESMT68%59%50%NMT67%49%51%

ArcsTypeF-MeasureBLEUTEREN-JASMT78%53%45%NMT68%40%53%

ArcsTypeF-MeasureBLEUTEREN-ZHSMT76%43%45%NMT73%43%44%

NMT

SMT

Ref.Diese Bandbreite stellt somit eher eine berbewertung der Produktion dar.

Es handelt sich daher eher um eine zu hoch angesetzte Produktionsschtzung.

Es handelt sich daher eher um eine zu hoch angesetzte Produktionsschtzung.This range therefore represents an overestimation of production.

It is therefore a question of a too high estimation of production.

It is therefore a question of a too high estimation of production.

Therefore this range represents rather an overestimate of the production.

Translation Quality Evaluation

KantanLQRQuick Introduction to KantanLQRBuilt into the KantanMT platformIntegral step in KantanMT Engine Development

KantanLQRProject Types now supported:-

Quality EvaluationFactored ModelKPIs based on Simplified FactorsMQMDQF Harmonised MQM-DQFReal-time dashboard

A/B TestComparison ModelKPIs based onA-BA-B-C andA-B-C-DReal-time dashboard

KantanLQR

KantanLQR

KantanLQR

KantanLQR

KantanLQR

KantanLQR

KantanLQR

KantanLQR

KantanLQR

Conclusions

ConclusionsKantanNeuralAvailable in three language arcs from KantanFleetEfficient training with OpenNMT and K520 Grid GPUsInherits the merits of KantanMT pipelinesA/B TestingA, B (C or D) testing now fully supportedReal-time data analytics built into your LQR Dashboard

Both KantanNeural and A/B Testing are available to all KantanMT Account holders

Future workKantanNeuralBuild-your-own KantanNeural engineIncremental/adaptive NMTAPI support for NMTFurther distribute the training and translation pipelines for even higher efficiencyIntelligent data preprocessing for even faster and more efficient training.A/B Testing

Solving

Thank you

41

Solving

Thank you

42

Learning to translate from data(Phrase-based) Statistical Machine TranslationLearn a phrase-table from word alignmentLearn a language model based on relative frequencies

leprogrammeatmisenapplicationAndTheprogrammeHasBeenimplemented

Learning to translate from data(Phrase-based) Statistical Machine TranslationLearn a phrase-table from word alignmentLearn a language model based on relative frequencies

leprogrammeatmisenapplicationAndTheprogrammeHasBeenimplemented

the program always together

Learning to translate from data(Phrase-based) Statistical Machine TranslationLearn a phrase-table from word alignmentLearn a language model based on relative frequencies

leprogrammeatmisenapplicationAndTheprogrammeHasBeenimplemented

the program always togetherhas been always together

Learning to translate from data

Translation Quality EvaluationKantanLQRBuilt into the KantanMT platformIntegral step in KantanMT Engine DevelopmentTranslation Quality EvaluationFactored ModelTemplates based on Simplified Factors, MQM, and DQF and MQM-DQFA/B TestingA, B (C or D) testing now fully supportedReal-time data analytics built into your LQR DashboardAvailable to all KantanMT Account holders

KantanLQR A/B TestingComparative analysis between two, three or four different Machine Translation (MT) outputs.

Designed to compare SMT with NMT outputs

KantanLQR A/B Testing - create

KantanLQR A/B Testing - create

KantanLQR A/B Testing add data

KantanLQR A/B Testing add data

KantanLQR A/B Testing add data

KantanLQR A/B Testing scoring

KantanLQR A/B Testing review

KantanLQR A/B Testing review


Recommended