DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation

DeepMiner Integrating Translation Memories and Machine Translation

TEKOM

October 25th, 2012

Presenter: Daniel Benito

Introduction

• History

• Limitations of Translation Memory

• Beyond Segment-Level Reuse – Machine Translation

– Fuzzy Match Repair

– Advanced Leveraging

– Combining TM and MT

• Current Limitations

• Perspectives

• Conclusion

History

• Past:

– 1950s – Early Machine Translation (MT) experiments

– 1960s – General awareness that Machine Translation (MT) was not going to replace human translators

– 1970s – First proposals for Translator Workstations

– 1990s – Translation Memory (TM) became viable

• Present:

– TM technology has barely advanced in the last ten years

– MT has advanced to the point where its applications in the translation industry are incontrovertible

Limitations of Translation Memory

• Segment-level translation reuse is only useful in limited cases

• Even in highly repetitive texts, most of the repetitions happen at the sub-segment level:

– Terms and phrases

– Sentence structure

• Most Translation Memory systems are limited to providing fuzzy matches but are unable to exploit sub-segment repetition

Beyond Segment-level Reuse

• We need to translate: EN: The black cat usually sleeps in the hallway.

• Our TM contains: EN: The grey cat usually sleeps in the living room.

DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.

• What can we do to reduce the time spent editing fuzzy matches?

– Ignore the fuzzy matches and use MT

– Automatically repair the fuzzy matches

Machine Translation


• Results returned by various MT systems: DE: Die schwarze Katze in der Regel schläft im Flur.

DE: Die schwarze Katze schläft normalerweise im Flur.

• Achieving consistency and using specific terminology (e.g. Gang instead of Flur) will require some degree of training or post-editing

Machine Translation

• General-purpose MT engines such as Google Translate or Microsoft Translator usually require extensive post-editing, but can be used for inspiration

• Rule-based and statistical MT engines customized for specific domains offer much higher quality but require expensive tuning or retraining

• It is usually more expensive to use MT than to manually edit a fuzzy match

Fuzzy Match Repair

• Inspired by the translation by analogy concept from Example-Based Machine Translation (EBMT)

• Attempts to maintain the quality and consistency of existing translations in the TM while increasing productivity

Fuzzy Match Repair




• We can replace graue with schwarze and Wohnzimmer with Gang to produce an exact match.

Fuzzy Match Repair

• Requires knowing the following translations: grey → graue

black → schwarze

living room → Wohnzimmer

hallway → Gang

• What do we do if those translations are not explicitly in our TMs or termbases?

Advanced Leveraging

• Bilingual concordance search:

EN: The grey cat usually sleeps in the living room.


EN: Mary has bought a new pair of grey running shoes.

DE: Maria hat ein neues Paar graue Laufschuhe gekauft.

EN: This article is also available in grey.

DE: Dieser Artikel ist auch in grau erhältlich.

Advanced Leveraging

• Statistically infer translations from the TM

• Compare all of the German translations and suggest one or more probable translations (e.g. graue, grau)

• Requires:

– Large TMs with many examples

– Consistent translations in the TM

Combining TM and MT

• We can use MT as an additional resource for finding the translations needed to repair fuzzy matches

• MT systems often give better results for terms and short phrases than for long sentences

• We approach this combination based on the following premises: – A client’s own data is considered to be of higher quality

and will always have priority over the Machine Translation results

– A fuzzy match repaired with Machine Translation will usually be better than a normal fuzzy match, and better than an MT result for an entire segment

Combining TM and MT




• Our termbase contains: EN: grey

DE: graue

EN: black

DE: schwarze

EN: hallway

DE: Gang

Combining TM and MT

• We do not have the translation for living room in our TM or our termbase, so we can request it from the MT system:

EN: living room

DE: Wohnzimmer

• The combination of material in our TM, termbase and MT system allows to perform the appropriate replacements and obtain:

EN: The black cat usually sleeps in the hallway.

DE: Die schwarze Katze schläft gewöhnlich im Gang.

Current Limitations

• We need to translate: EN: The white dog usually sleeps in the living room.



• Our termbase contains: EN: grey cat

DE: graue Katze

Current Limitations

• Asking the MT system for the missing translation, we get:

EN: white dog

DE: weißer Hund

• The result of fixing the fuzzy match is: EN: The white dog usually sleeps in the living room.

DE: Die weißer Hund schläft gewöhnlich im Wohnzimmer.

• Some post-editing is still required

Current Limitations

• We need to translate: EN: The grey cat often sleeps in the living room.



• The translations we get from the MT system are: EN: usually

DE: normalerweise

EN: often

DE: oft

• We cannot repair the fuzzy match because we do not know how usually has been translated

Future Developments

• Greater integration with the MT engines

– Access to internal translation candidates: • EN: usually

• DE: normalerweise, gewöhnlich, sonst, ...

– Access to internal language models: • DE: Die weißer Hund – never

• DE: Der weiße Hund – often

– Automatic upload of new TM material to the MT engine so it can be used for retraining in the future

Conclusion

• Traditional segment-level translation reuse has reached its full potential

• ATRIL’s Déjà Vu X2 already includes DeepMiner technology that improves productivity by cleverly combining all the approaches we described:

– (Statistical) Machine Translation

– Example-Based Machine Translation

– Advanced Leveraging (sub-segment matching)

Questions?

Additional Topics

Predictive Typing

• Find all sub-segment matches and offer them to the translator as he or she types

• Suggestions are context-sensitive, so there are never too many results to choose from

• Translations are constructed piece by piece from previous texts, guided by the translator

Advanced Predictive Typing

• Advanced Leveraging techniques for statistically inferring sub-segment translations from the TM can be adapted to provide additional predictive typing suggestions

• Translations from MT can be added to the predictive typing mechanism, to offer additional suggestions for translations of terms and phrases

MT integrations in Déjà Vu X2

• Systran Entreprise Server

• Google Translate

• Microsoft Translator

• PROMT Translation Server

• itranslate4eu

Systran Entreprise Server

Google Translate

Microsoft Translator

PROMT Translation Server

itranslate4eu

Date post:	21-May-2015
Category:	Technology
Upload:	dejavu-atril
View:	442 times
Download:	1 times

DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation

Technology