Translation memories
Hermes Traducciones y Servicios Lingüísticos
A brief history…
Processes have changed…
…but not the ultimate goal.
Productivity
Found in Translation,
Nataly Kelly & Jost Zetzsche
(2012)
LAN
Project Managers
LAN Server
Translation Memory
Translators Revisers
Engineering
INTERNET
Translator Reviser DTPer Project Manager
Project Managers
LAN Server
Translation Memory
Translators Revisers
Engineering
WAN
MT CAT
TEnTs SaaS
Crowdsourcing
Clouding
Project Managers
Translation Memory
Translators
Revisers
Engineering
LAN Server
Internals of a translation memory
Translation Memory Exchange
•OSCAR (Open Standards for Container/Content Allowing Re-use)
•TMX Standard (Translation Memory eXchange).
•Leveraging of translation memories regardless the tool or platform.
The ancestors of CAT Tools…
XL8 DOS tool in a workflow known as XLN
IBM TranslationManager
Exact match
Proposed terms in dictionary
Source text
Translation
proposal
Trados Workbench
Déjà-Vu
Star Transit (no memory!)
WordFast
SDLx
memoQ
OmegaT (free!)
Workflow tools: Across
Across
SDL Idiom World Server
Specialised tools: Catalyst
Specialised tools: Passolo
Basic TM features in CAT tools
Leverage of previous translations.
Analysis for quoting, planning and keeping
track of progress.
Concordance for sub-segment searches.
Maintenance to perform global changes,
import/export content, etc.
Leveraging TMs
CAT tools provide answers to these questions:
What is the fuzzy match of the segment?
What parts of the text are different?
Where is the match coming from?
Fuzzy match display
Fuzzy match display (II)
Fuzzy match display (III)
Fuzzy match display (IV)
Analysis feature
Every word from each segment is assigned to a different match band:
101%
100%
99-95%
94-85%
84-75%
New words
Repetitions
Analysis results
Different tools, different word counts
101% 41,352
100% 4194
99-95% 3698
94-85% 2077
84-75% 5270
New words 5241
Repetitions 2068
Total 63,900
CAT Tool 1 CAT Tool 2
101% 29,782
100% 16,002
99-95% 6038
94-85% 2633
84-75% 1369
New words 6150
Repetitions 5451
Total 58,425
Different word counts
There is no standard fuzzy matching algorithm.
CAT tools may have different auto-substitution elements:
numbers, dates, acronyms, variables, etc.
Different approaches to 101% matches.
Cross-file repetitions and internal fuzzy leverage.
Different file format filters.
Different segmentation rules.
SRX is the standard for segmentation rules.
Weighted word count
Each band is assigned a percentage of the full word rate
according to a weighting scheme (negotiable per client). For
example:
101% 0%
100% 20%
99-95% 30%
94-85% 40%
84-75% 50%
New words 100%
Repetitions 20%
Different tools, different word counts (II)
Band Words
Weighted
words
101% 41,352 x 0% 0
100% 4194 x 20% 839
99-95% 3698 x 30% 1109
94-85% 2077 x 40% 831
84-75% 5270 x 50% 2635
New words 5241 x 100% 5241
Repetitions 2068 x 20% 414
Total 63,900 11,069
CAT Tool 1 CAT Tool 2
Band Words
Weighted
words
101% 29782 x 0% 0
100% 16002 x 20% 3200
99-95% 6038 x 30% 1811
94-85% 2633 x 40% 1053
84-75% 1369 x 50% 684
New words 6150 x 100% 6150
Repetitions 5451 x 20% 1090
Total 58,425 14,989
Weigted word count tools
TMs and statistical analysis
If big enough, TMs provide the bilingual corpus
necessary to build SMT engines.
Some CAT tools can scan the TM in search of
correlation between words in source and target.