Translation Technology PlatformTranslation Technology Platform
TM
Kirti VasheeKirti Vashee
VP Sales, Asia OnlineVP Sales, Asia Online
[email protected]@asiaonline.net
Revolutionize the Internet experience for non-English
speakers in Asia
Provide 1 billion+ local-language pages online using mostly translated open license content, combined with compelling portal and social networking style services in Thailand,
Revolutionize the enterprise translation process with a comprehensive, continuous learning SMT platform
SaaS environment that allows data cleaning and preparation, develop SMT engines on demand and enable ongoing
Copyright © 2008, All Rights ReservedTM
networking style services in Thailand, Indonesia, India, Malaysia, Philippines, Vietnam and China, Japan & Korea
The Consumer Market
Large Buyer & Publisher Perspective
engines on demand and enable ongoing comprehensive post editing and correction to continuously improve engines
The Enterprise Market
Translation Tools Vendor Perspective
• The only SMT technology provider that is also a major user of
ALT technology on one of the largest translation projects in the
world - English Wikipedia (1B Words+) into 11 Asian languages
using SMT and crowdsourcing
• The translation tools and technology platform used to
accomplish this, is also being made available as a SaaS
Copyright © 2008, All Rights ReservedTM
accomplish this, is also being made available as a SaaS
product for the enterprise translation market
�Battlefield of words
�Fusion with customer support
�Continuous translation
Copyright © 2008, All Rights ReservedTM
�Community translation
� Industry-shared language data
�Massive online collaboration
�Translation automation
Knowledge
Interactive Support:
Instant Knowledge Knowledge
BaseBase
Copyright © 2008, All Rights ReservedTM
User Manuals
Support Documentation
Knowledge Base Data
User Generated Content
Instant Messaging
Voice
Blogs
• Web 2.0 is much more interactive and dynamic
• Globalization will be further driven by internet penetration into Asia
• Word-of-mouth-marketing gaining prominence all over the world
• Unstructured content in blogs, review sites is becoming critical
• The dialogue with global customer needs to be more interactive
Knowledge Knowledge
BaseBase
Interactive Interactive
SupportSupport
User User
ManualManual
Continuous Improvement HDSMT EnginesSales /
Marketing
Product Management
Blogs
CRM
Biz Intelligence
Blogs
CRM
Biz Intelligence
Content Management
Human Resources
ECM
BPM
Human Resources
ECM
BPM
Copyright © 2008, All Rights ReservedTM
• Highly adaptive human driven process for continuous output quality improvement in SMT engines and translation automation
• Intensive Collaboration with human translators to raise quality of SMT
• Integration with content creation and content refinement tools to enhance speed and improve business process management
• Continued evolution in standards to facilitate sharing linguistic assets
The Global
Customer
BPMBPM
Customer Support
CRM
IM
CRM
IM
• Comprehensive SaaS Platform that facilitates the translation and continued refinement of any large high value translatable corpus using HDSMT
• Existing Feature Set– Data Cleaning & Preparation Tools
– On Demand SMT engine development
Copyright © 2008, All Rights ReservedTM
– On Demand SMT engine development
– Support for both user created and online dictionaries and glossaries
– Ability to pool data for greater leverage
– Multiple level domain support
– Seamless integration with collaborative post-editing environment
– Real time updates of translated assets
– Web Services based APIs for integration
• System and process foundation for managed online community collaboration
Data Data ManagementManagement
• Bilingual Data Preparation & Cleaning
• Bilingual Data Normalization & Optimization
• Source Cleanup and Preparation
• Grammar and Spelling validation
• Monolingual Data Extraction & Analysis
• SMT System Training & Development
Copyright © 2008, All Rights ReservedTM
SMT Engine
• SMT System Training & Development
• Monolingual Data Training
• Ongoing Corpus Refinement and Tuning
• Analysis and Evaluation of Ngrams
Output Output Proofing & Proofing & EditingEditing
• Error Pattern Identification & Correction
• Automated error correction tools
• Continuing Cycle of Exception Identification and Correction
• Development of small sets of new data to correct errors
Copyright © 2008, All Rights ReservedTM
• Data Cleaning Utilities to normalize and standardize data prior to consolidation to provide maximum leverage
• Recent study for TAUS proves conclusively that sharing clean data provides leverage
– Smaller amount of clean data can produce better results than datasets even 2X larger
– Consistent Terminology matters and provides real leverage
Copyright © 2008, All Rights ReservedTM
– Consistent Terminology matters and provides real leverage
– Data optimized for TM Tools can be “dirty data “ for SMT
Initial System put into production
Trained Internal Experts begin initial
clean up and correction process
Changes are collected and added to initial corpus to drive
continuous retraining
Copyright © 2008, All Rights ReservedTM
process
Expert Users also allowed to make
changes
All users allowed to suggest changes which
go through vetting process
continuous retraining
Community
Initial SystemTargeted Corrections
of Bad Learning
Spelling & Terminology
Copyright © 2008, All Rights ReservedTM
Correct
Mistranslation
Syntax/Grammar
Terminology
Spelling
PunctuationHuman Feedback can
raise the raw output to previously
unseen quality levels
Copyright © 2008, All Rights ReservedTM
Copyright © 2008, All Rights ReservedTM
Information Requests
GetAccountInformation
GetAccountUsageHistory
GetAvailableDomainCombinationsForLanguagePair
GetAvailableDomainsForLanguagePair
GetAvailableLanguagePairs
GetCustomDomainsForLanguagePair
Data Storage
CreateDataset
DeleteDataset
DeleteDataFromDataset
Data Training
CancelTrainingJob
GetTrainingJobList
GetTrainingJobStatus
SubmitDatasetForTraining
Data Preparation
CleanText
ExtractText
NormalizeText
OCRImage
ParagraphAlignLanguagePairText
Copyright © 2008, All Rights ReservedTM
DeleteDataFromDataset
DownloadDataset
DownloadDatasetItem
GetDatasetList
GetDatasetItemList
LinkDataToDataset
MergeDatasets
UploadData
UploadGlossary
UploadImage
UploadLanguageModel
UploadMonolingualText
UploadOCRPageLayout
UploadPhrasePairs
UploadTranslationMemory
UploadZIP
ParagraphAlignLanguagePairText
SentenceAlignLanguagePairText
SentenceSegmentText
SpellCheckText
WordSegmentText
Translation
CancelTranslationJob
GetTranslationJobList
GetTranslationJobStatus
SubmitDatasetForTranslation
SubmitSinglePhraseForTranslation
sUsername String The username of the person making the request.
sPassword String The password of the person making the request.
iAccountNo Integer The account number that this request is associated with.
iDepartmentNo Integer The department number that this request is associated with.
iLanguagePairCode Integer The code for the language pair that is being looked up.
Copyright © 2008, All Rights ReservedTM
Copyright © 2008, All Rights ReservedTM
Copyright © 2008, All Rights ReservedTM
Copyright © 2008, All Rights ReservedTM
Copyright © 2008, All Rights ReservedTM
Publishers
User
Social Networks /
Community
Translation
Systems
Leverage ASP
Translation service
for translation of
new material
Provide existing human
translated content for
training language engines
User accesses
online content in
local language
Translated content proof
read using community
principles and paid proof
Constant
Improvement
Copyright © 2008, All Rights ReservedTM
Asia Online
Portal
Translation
SaaS
Human Proof Readers
Original Content
Translated Content
Translations are
proof read via ASP
proof reading system
New
translations
sent back to
publisher
Proof reading
still required
whether human
or machine
translation
new material
Original Content translated
to local language
principles and paid proof
readers using Asia Online
proof reading system
Translated
content made
available to
users
• Integrated data cleaning, data preparation, SMT systems
development and post-editing environment
• Comprehensive proof-reading and post-editing environment
that is integrated with core SMT engines to enable instant
updates Greater Control & Better systems
• Greater transparency of many key SMT building blocks to
Copyright © 2008, All Rights ReservedTM
• Greater transparency of many key SMT building blocks to
enable users to see and modify what the system has learnt
resulting in greater control and better systems
• A richer and deeper taxonomy for domains to ensure the best
quality Better systems
• Incremental additions of new training data to any existing
system to enable rapid updates Faster updates
• Easy handling of terminology, glossaries, dictionaries
TM
Kirti VasheeKirti Vashee
VP Sales, Asia OnlineVP Sales, Asia Online
[email protected]@asiaonline.net