Date post: | 21-May-2015 |
Category: |
Technology |
Upload: | dejavu-atril |
View: | 442 times |
Download: | 1 times |
DeepMiner Integrating Translation Memories and Machine Translation
TEKOM
October 25th, 2012
Presenter: Daniel Benito
Introduction
• History
• Limitations of Translation Memory
• Beyond Segment-Level Reuse – Machine Translation
– Fuzzy Match Repair
– Advanced Leveraging
– Combining TM and MT
• Current Limitations
• Perspectives
• Conclusion
History
• Past:
– 1950s – Early Machine Translation (MT) experiments
– 1960s – General awareness that Machine Translation (MT) was not going to replace human translators
– 1970s – First proposals for Translator Workstations
– 1990s – Translation Memory (TM) became viable
• Present:
– TM technology has barely advanced in the last ten years
– MT has advanced to the point where its applications in the translation industry are incontrovertible
Limitations of Translation Memory
• Segment-level translation reuse is only useful in limited cases
• Even in highly repetitive texts, most of the repetitions happen at the sub-segment level:
– Terms and phrases
– Sentence structure
• Most Translation Memory systems are limited to providing fuzzy matches but are unable to exploit sub-segment repetition
Beyond Segment-level Reuse
• We need to translate: EN: The black cat usually sleeps in the hallway.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• What can we do to reduce the time spent editing fuzzy matches?
– Ignore the fuzzy matches and use MT
– Automatically repair the fuzzy matches
Machine Translation
• We need to translate: EN: The black cat usually sleeps in the hallway.
• Results returned by various MT systems: DE: Die schwarze Katze in der Regel schläft im Flur.
DE: Die schwarze Katze schläft normalerweise im Flur.
• Achieving consistency and using specific terminology (e.g. Gang instead of Flur) will require some degree of training or post-editing
Machine Translation
• General-purpose MT engines such as Google Translate or Microsoft Translator usually require extensive post-editing, but can be used for inspiration
• Rule-based and statistical MT engines customized for specific domains offer much higher quality but require expensive tuning or retraining
• It is usually more expensive to use MT than to manually edit a fuzzy match
Fuzzy Match Repair
• Inspired by the translation by analogy concept from Example-Based Machine Translation (EBMT)
• Attempts to maintain the quality and consistency of existing translations in the TM while increasing productivity
Fuzzy Match Repair
• We need to translate: EN: The black cat usually sleeps in the hallway.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• We can replace graue with schwarze and Wohnzimmer with Gang to produce an exact match.
Fuzzy Match Repair
• Requires knowing the following translations: grey → graue
black → schwarze
living room → Wohnzimmer
hallway → Gang
• What do we do if those translations are not explicitly in our TMs or termbases?
Advanced Leveraging
• Bilingual concordance search:
EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
EN: Mary has bought a new pair of grey running shoes.
DE: Maria hat ein neues Paar graue Laufschuhe gekauft.
EN: This article is also available in grey.
DE: Dieser Artikel ist auch in grau erhältlich.
Advanced Leveraging
• Statistically infer translations from the TM
• Compare all of the German translations and suggest one or more probable translations (e.g. graue, grau)
• Requires:
– Large TMs with many examples
– Consistent translations in the TM
Combining TM and MT
• We can use MT as an additional resource for finding the translations needed to repair fuzzy matches
• MT systems often give better results for terms and short phrases than for long sentences
• We approach this combination based on the following premises: – A client’s own data is considered to be of higher quality
and will always have priority over the Machine Translation results
– A fuzzy match repaired with Machine Translation will usually be better than a normal fuzzy match, and better than an MT result for an entire segment
Combining TM and MT
• We need to translate: EN: The black cat usually sleeps in the hallway.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• Our termbase contains: EN: grey
DE: graue
EN: black
DE: schwarze
EN: hallway
DE: Gang
Combining TM and MT
• We do not have the translation for living room in our TM or our termbase, so we can request it from the MT system:
EN: living room
DE: Wohnzimmer
• The combination of material in our TM, termbase and MT system allows to perform the appropriate replacements and obtain:
EN: The black cat usually sleeps in the hallway.
DE: Die schwarze Katze schläft gewöhnlich im Gang.
Current Limitations
• We need to translate: EN: The white dog usually sleeps in the living room.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• Our termbase contains: EN: grey cat
DE: graue Katze
Current Limitations
• Asking the MT system for the missing translation, we get:
EN: white dog
DE: weißer Hund
• The result of fixing the fuzzy match is: EN: The white dog usually sleeps in the living room.
DE: Die weißer Hund schläft gewöhnlich im Wohnzimmer.
• Some post-editing is still required
Current Limitations
• We need to translate: EN: The grey cat often sleeps in the living room.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• The translations we get from the MT system are: EN: usually
DE: normalerweise
EN: often
DE: oft
• We cannot repair the fuzzy match because we do not know how usually has been translated
Future Developments
• Greater integration with the MT engines
– Access to internal translation candidates: • EN: usually
• DE: normalerweise, gewöhnlich, sonst, ...
– Access to internal language models: • DE: Die weißer Hund – never
• DE: Der weiße Hund – often
– Automatic upload of new TM material to the MT engine so it can be used for retraining in the future
Conclusion
• Traditional segment-level translation reuse has reached its full potential
• ATRIL’s Déjà Vu X2 already includes DeepMiner technology that improves productivity by cleverly combining all the approaches we described:
– (Statistical) Machine Translation
– Example-Based Machine Translation
– Advanced Leveraging (sub-segment matching)
Questions?
Additional Topics
Predictive Typing
• Find all sub-segment matches and offer them to the translator as he or she types
• Suggestions are context-sensitive, so there are never too many results to choose from
• Translations are constructed piece by piece from previous texts, guided by the translator
Advanced Predictive Typing
• Advanced Leveraging techniques for statistically inferring sub-segment translations from the TM can be adapted to provide additional predictive typing suggestions
• Translations from MT can be added to the predictive typing mechanism, to offer additional suggestions for translations of terms and phrases
MT integrations in Déjà Vu X2
• Systran Entreprise Server
• Google Translate
• Microsoft Translator
• PROMT Translation Server
• itranslate4eu
Systran Entreprise Server
Google Translate
Microsoft Translator
PROMT Translation Server
itranslate4eu