How translators work in
real life:
SCATE observations
Frieda Steurs
Iulianna van der Lek-Ciudin
Tom Vanallemeersch
What & Why
Improve translation efficiency and consistency
Underexploited translation resources
Poor integration of speech recognition
Overloaded interfaces
March 2014 - February 2018
Consortium
Centre for Computational Linguistics, University of Leuven
Industrial Advisory Committee
Today’s
focus
Methods Survey
Contextual inquiries
Methods
Survey: Dec 2014 – Feb 2015
46 questions
187 complete responses (75% from EU)
73 % freelance translators
25 % in-house translators
Few terminologists, interpreters, project
managers, post-editors
Contextual Inquiries: Nov 2014 - June 2015
16 professionals at their workplaces (BE, NL, LU)
Semi-structured interviews, observations, think-aloud,
post-interviews
Whom did we observe?
Organization type
Small translation agency
Medium-size translation/interpreting agency
Public institution
Freelance
Language pairs EN-NL/NL-EN , FR-NL, EN-FR, EN-RO
Translation experience 2-5 years vs. 5 + years
Domains of expertise Legal, ICT, Medical, Marketing
Main TEnT
Trados Studio 2014, Trados Studio 2011, Trados
Workbench, Déjà Vu X3, memoQ 2014,
Wordbee
Experience with TEnT <1 year (2) vs. 5+ years
Main findings and implications Needs and shortcomings of tools
Observation of terminological strategies
Translators’ Linguistic Resources
Resource State-of-the-art Opportunities
Translation
Memories
• Heavily used
• Concordance, term look-up
features, term extraction
• Term extraction rarely used
• Alignment
• No support for comparable
corpora (possible to upload
monolingual documents for
reference)
• Syntactic concordance
• Bilingual/multilingual
term extraction
• More focus on
monolingual corpora
• Features to compile
and query comparable
corpora
Online Translation
Memories
• Perform look-up during
translation
• Automatic insertion
• Concordance searches
• Moderate-low quality control
• More advanced filtering
techniques
• QA tools
Translators’ Linguistic Resources
Resource State-of-the-art Opportunities
Local term bases • Usage is still low (SCATE
survey -> 52%)
• Automatic term recognition
• Basic categories
• TBX not adopted by all tool
developers
• Users prefer to exchange
data in CSV, Excel
• Improve usability
• More flexibility and
customization to suit
users with different
needs
• A unified interface for
online/local term bases
• Support for ontologies
Online term banks Perform look-up (exact/fuzzy)
during translation
Advanced pre-filtering,
techniques, better look-up
interfaces
Online dictionaries,
search engines
Consulted either online or via
a WebSearch feature in CAT
Concordance-like searches
directly from the translation
editor
Translators’ Linguistic Resources
Resource State-of-the-art Opportunities
Machine Translation • Usage is still low (SCATE
survey 27%)
• Consulted online
• Via API in CAT
• Segment assembly
(DejaVu, memoQ)
• Autocompletion
suggestions (SDL Trados)
• Adaptive MT (MateCat,
Lilt)
• Improve confidence
estimation
• Interfaces for post-
editing
• Train own MT engine
with own TMs, TBs
Term collection
• Manually (88%)
• Semi-automatically via term extraction programs (22%)
Term storage
• CAT TB (52%) Most frequent form/canonical form
• MS Excel (43%) The language equivalents (56%)
• MS Word (27%)
Term research
• Online resources (94%)
• Personal resources (85%)
• Client’s resources (64%)
SCATE Users’ survey 2014-2015
187 survey participants
139 perform terminology activities
Search engines
Bing
Online dictionaries
Oxford
Proz.com
Van Dale
TermWiki Search
TermCoord glossary links
Term banks
IATE
Termium Plus
EuroTerm Bank
FAOTERM
WTOTERM
Monolingual Corpora
Eur-lex
Global web-based English
British National corpus
Corpus of contemporary AE
Parallel corpora
Linguee
Europarl
Glosbe
TAUS Search
SCATE Users’ survey 2014-2015
Most used online terminology resources
Reasons for NOT managing terminology
No knowledge about terminology management
theory and principles
It is the responsibility of somebody else
It has no added value
It is a time-consuming task
Term bases are complex
Reliance on the translation memories
SCATE Users’ survey 2014-2015
Systematic terminology
management
• Collect terms and concepts
from global field
• Construct a concept
system
• Create well-structured
definitions
• Create term entries
Ad-hoc terminology
management
• Identify terms in isolated
contexts
• Create initial term entries
• Add definition, context ….
Adapted from Handbook of
Terminology Management Vol 1.
Medium & small
LSPs, freelancers
In-house translation
departments of large
organizations
Terminology strategies
Institution In-house translation
departments
Translators / terminologists
In-house terminology coordination
Systematic and ad-hoc terminology management
Term extraction – not a standard practice!
16
Terminology tools Translation
tools
IATE database SDL Trados
Studio
Eur-Lex In-house MT
Quest Metasearch (Bilingual) Voice recognition
Euramis Concordance
DGT Vista
Electronic dictionaries,
glossaries
Term extraction tools:
SynchroTerm, SDL MultiTerm
Extract, TermTreffer
External corpus query tools,
e.g. TextStat
Terminology strategies
Adapted after TermCoord documentation
Terminology strategies
Proactive terminology management
Preparation of “TermFolders” for important legislative
procedures:
Desktop research
Manual collection of web links and relevant
documents
Manual identification and extraction of term
candidates
….
Terminology strategies
Time-consuming No GlobalSearch
DIY Corpora
tools?
SCATE?
Terminology strategies
Small and medium-size LSPs, freelancers
Mainly ad-hoc, basic terminology management due to:
o Time pressure
o Lack of financial compensation
o Over-reliance on translation memories
o A general lack of knowledge and awareness of the
benefits of terminology management
o Not familiar with corpus compilation and query tools
Ad-hoc terminology strategies during translation
• LGP, terminology, phraseology, names of entities, typography/punctuation…
• Highlight or copy/paste SL term
Identify problem
• Local resources: Concordance, Term Look-up, Find & Replace, Global search
• Online resources via WebSeach or other integrated widgets
• MT via plugins, if available & allowed
• Online resources: Google -> Top hits (Bookmark link?)
• Contact client via e-mail or an online query spreadsheet
• Contact subject matter experts
Search for a solution
• One click
• Copy/paste
Insert translation
• Term base / Excel
Save terms
Implications
For translators, project managers, terminologists,
interpreters, translators’ educators:
Basic knowledge of terminology theory and practice
Terminology management tools
Preparation of glossaries before the start of the
project with the help of:
Corpus compilation and query tools (BootCat,
AntConc, SketchEngine)
Term extraction tools (SynchroTerm, Similis)
More focus on comparable corpora
Implications
For software developers:
Focus more on usability and personalization
Unified interfaces between local and online
resources
More sophisticated search functionalities
Integrate online resources that are
actually used by the users
More focus on comparable corpora
SCATE approach
SCATE research
Improvement of bilingual and multilingual term
extraction techniques from comparable
corpora
Integration of a syntactic concordancer in
parallel corpora: e.g. Poly-Gretel
Multilingual term extraction from
comparable corpora
A gold standard for Automatic Terminology Extraction
Compilation – Annotation – Evaluation
# words Hartfailure Wind
energy Corruption
Corruption
(parallel)
English 48.843 324.842 454.904 179.229
French 55.383 358.853 547.072 230.874
Dutch 50.850 315.605 476.179 223.495
Annotation: 4 labels (Term, Common Term, Out of Domain
Term and Named Entity) with elaborate and practical
guidelines
Evaluation: inter-annotator agreement between 3 annotators
after 2 iterations (av. f-score = 0,895; av. Cohen‘s kappa =
0.927)
Future work: linking the annotations in the comparable
medical corpus across all 3 languages
A Gold Standard for Automatic
Terminology Extraction
Bilingual lexicon induction from comparable
corpora
Techniques for extracting word representations:
o multilingual topic models
o multilingual word embedding models
o character-level representations
Comparable corpora
Cross-lingual semantic word representations
Bilingual lexicon
Best results
Poly-Gretel
Bilingual syntactic concordancer
Query parallel corpora
Available online at: http://gretel.ccl.kuleuven.be/poly-gretel/ebs/input.php?1477144000
Target audience:
Computer-assisted language learning (CALL)
Translators
Translation studies and comparative linguistics
Poly-Gretel
EN noun + report ↔ NL verslag + prep + noun
Example query:
Poly-Gretel
EN noun + report ↔ NL verslag + prep + noun
EN-NL constituents are automatically aligned
Poly-Gretel
EN noun + report ↔ NL noun
Example query:
Many compounds are possible
More about SCATE https://www.arts.kuleuven.be/ling/ccl/projects/scate