Post on 09-May-2015
description
transcript
Translation and Crowd Sourcing: Opportunity or Heresy?
Professional translators’ attitudes towards massive collaboration
Alain Désilets
Conseil national de recherches du Canada
“The most reliable way to forecast the future is to try to understand the present.”
“Trends, like horses, are easier to ride in the direction they are going.”
-- John Naisbitt
"You have to talk to [customers], watch them; this is the only way to understand their interests, their motives, their needs".
-- Donald Norman
Observing Translators
Multi-disciplinary project that includes technology researchers from NRC and translation studies researchers from Université du Québec en Outaouais.
Contextual Inquiry: well known and tried technique in Human Computer Interaction for learning about end users.
• Mix between observation and interviewing.• Observe potential end users while they work.• Ask them to think aloud.• Interrupt with lots of questions.• Use Qualitative and Quantitative data Analysis to make sense of
what you witnessed.
25 subjects
Organization type large (250+ employees) LSPs (13), medium (<30 employees) LSPs (6), freelance (2), academic (2), amateur(2)
Type of work “conventional” translation (15), MT-Post Editing (8), Revision (2)
Language pairs English -French (13),English-Spanish (6), English-Japanese (2) , Portuguese-Spanish (1), Chinese-English (1), English+Italian-Estonian (1), English-Inuktitut (1)
Years of experience Ranged from < 2 years up to 20+ years.
Source text domain Aboriginal affairs, Municipal affairs, Public administration, Education, Legal, Health, Software manual, Politics, Job offers.
Source text length Min: ~20, Max:7000
Country Canada (15), Europe (5), US (3), Japan (2)
Professional translators All but 2.
Preaching the Wiki Word to Translators
Involved in world of wikis since 2002 • Chaired WikiSym conference in 2007 (Montreal)
Have been telling professional translators about wikis since 2006• Keynote at Translating and the Computer 2007: “Translation Wikified”.• Organized a workshop and panel on those topics.
Co-implemented wiki-based tools to support translation work• Cross lingual wiki engine: translate in a wiki context where pre-
conditions of traditional translation workflows do not apply (ex: master language).
• Tiki-CMT: TikiWiki module to support Collaborative Multilingual Terminology work.
Talk Outline
Are professionnal translators technology averse?
Professional translators attitudes and workpractices with respect to:
• collaboratively built linguistic resources• collaborative translation and crowdsourcing
Please interrupt with questions at any point!
Are professional translators technology averse?
Translation Problems
Translators use a lot of technology when trying to resolve translation problems.
Translation problem =any source language word or expression which
presents a difficulty for a human translator (not machine) during the process of translation.
Term, idiomatic expression, named entity, etc...
Tools, Tools, more Tools!
private (to the individual) lexicons built using simple office suites (ex: Excel spreadsheets, MS-Word documents)
1 large, public general purpose bi-text (TransSearch) private (to the individual) or institutional Translation
Memories built with 3 different products (Trados, Multitrans, LogiTerm)
2 private (to the individual) or institutional, unaligned archives of previous translations, either stored in a database or the file system
9 unilingual general purpose dictionaries (Multidictionnaire, Petit Robert, Merriam-Webster, Dictionnaire des cooccurrences, dictionary.references.com, Canadian Oxford, Trésor de la langue française, www.dictionary.com, urban dictionary)
2 unilingual thesauri (Dictionnaire analogique, Dictionnaire des synonymes de l'Université de Caen)
2 unilingual specialized dictionaries and lexicons (Dictionnaire de droit québécois, Lexique des noms géographiques)
3 bilingual dictionaries (LexibasePro, René Merteens, Robert & Collins)
the source text being translated, as well as its partial translation
bilingual documents related to the source text (ex: minutes of meetings being discussed in the source text)2 instances of the client's Web sites
2 large, bilingual Web sites not directly related to the domain of the source text (gc.ca domain, Canadian Broadcasting Corporation)
3 large, bilingual Web sites directly related to the domain of the source text (CanLII, Canadian Federal Court, University of Ottawa)
the whole Web in the source or target language (mined using Google search engine)
2 manuals of style (Guide du rédacteur, Le Ramat de la typographie)
2 spell and grammar checkers (MS-Word, Antidote) 1 database of newspaper articles in the target language
Average of 10 resources in our subjects’ toolboxes!!!
Translators use a wide range of tools when resolving translation problems.
Vast majority of those are electronic.
Tools, Tools, more Tools! (2)
p < 0.001
Adoption of Corpus-Based Tools
• Both type equally used.• Corpus-based tools have made it into the mainstream.• But they have not displaced Termino-lexicographic tools.
Termino-lexicographic =
• Dictionary
• Terminology Database
• Lexicon, etc.
Corpus-based =
• Translation Memory
• Bilingual web site, etc.
p > 0.05
Advanced Google Use
Translators are among the world’s most advanced Google users.
• They know the advanced syntax, and expect it in most search tools they use.
• They use Google in various ways to mine the web-as-a-corpus– Ex: search bilingual sites for solutions, assessing
usage in target language of particular solutions.
Searching Bilingual Sites
Searching Bilingual Sites (2)
Hot Buttons
That said, translators strongly resist technology that either:
• disrupts the fair compensation equation , or• exerts downward pressure on quality of end product
Translation crowdsourcing is likely to press on both these buttons.
Fair Compensation Equation
Translators are paid by the word.Technology that increases productivity exerts strong downward pressure
on per-word ratePressure is not always commensurate with actual productivity gain.
Example:• 10 words sentence with 80% fuzzy match level. • Should translator only get paid for two words? • Eventhough she still has to read the whole sentence...• ... and may have to change the rest of the sentence to make it work
with the translation of those 2 words?
Once a new fair equilibrium has been reached, this initial resistance may go away.
Lowering Quality?
TranslatorsCraftspeople who take pride in the quality of their end product.Quality = original sense is rendered, AND translation reads as
though it was an original text written by a native speaker.
CustomersTranslation = cost center, not part of their core businessCan’t always tell quality when they see it, nor measure clear link
between translation quality and bottom line. liable to introduce cost-reducing technologies without realizing
impact on quality.
Attitudes and workpractices with respect to collaboratively built linguistic resources
Is Wikipedia Useful for Translators?
We witnessed very little use of Wikipedia in our translator observation.
On a few occasions, subjects consulted Wikipedia to get background information on a particular concept, but never to get a solution to a terminology difficulty.
Analysis conducted in June 2007 indicates that coverage of typical terminology difficulties may be insufficient for the later task (finding equivalents).
Is Wikipedia Useful for Translators (2)?
WikiPedia Wiktionary TERMIUM
Has English entry 71.4% 47.6% 80.1%
Has English entry in right sense
57.1% 45.2% 76.2%
Has French equivalent 33.3 % 35.7% 76.2%
Has French equivalent in correct sense
26.1 % 33.3% 76.2%
Wikipedia’s coverage of 42 observed terminology problems (June 2007)
Note: TERMIUM = Terminology DB of the Gov. Of Canada.
Is Wikipedia Useful for Translators (3)?
Evolution of translators attitudes towards Wikipedia and wikis:
4 yrs ago: “Wikipedia, what’s that?”
3 yrs ago: “I know about Wikipedia and I think it’s crap because any clown can write to it.”
2 yrs ago: “You know, Wikipedia is surprisingly good and I use it all the time in my work now.”
1 yr ago: “This collaborative, wiki stuff is bound to be important for translation, but I am not sure how best to leverage it.”
What Makes a Good Source?
Professional Translators are trained to:• only use trusted sources.• focus on sources that are specialised for their domain or client.• never use content that may have been translated, or written by
non-native speakers.
In theory, that would rule out most collaborative sources.
In practice, translators are pragmatic and will consult sources that do not meet those criteria when necessary.
Use of Public Sources
• Our subjects used significantly more public resources.• Many of them available for free (ex: customer’s web site).• Caveat: Situation is different for highly repetitive, technical translation.
p < 0.001
Public =
Anyone can access, possibly at a fee.
Private =
Only accessible to certain translators (ex: those working for particular employer)
Use of General Sources
• Our subjects used significantly more multidomain resources.• Seemed to prefer casting a wide net, and then sift the results.• Caveat: again here, situation is different for highly repetitive, technical
translation.
p < 0.05
Multidomain =
Covers multiple domains, and subject searched it without restricting domain
Single domain =
Covers single domain, or, covers many, but subject restricted search by domain
Use of Translated Material
Our subjects frequently searched in bilingual Canadian sites for French equivalents.
Estimated 75% of French content on those sites was translated.
Thus, in 75% of the case, this strategy ended up yielding solutions taken from translated material.
Frowned upon in Terminology, and, to a lesser extent in Translation.
But our subjects did it anyway. Here’s why…
Translator Jugement
Subjects exercised a lot of critical judgment w.r.t to resources.• Did not blindingly trust any source, even highly reputed ones like
TERMIUM (Terminology DB of the Gov. of Canada).• In 35% of the cases, searched in a second resource, after finding
some relevant information.• Subjects adept at rapidly scanning list of suggestions and sifting
grain from chaff.• Problem Coverage (i.e. probability that at least one relevant
solution found in top 10), seemed more important than Precision (i.e. probability that a proposed solution is relevant).
• Recall (i.e. percentage of all relevant solutions that is actually proposed by the resource) also seemed important, but to a much lesser degree.
Resources Quality Control
• Our subjects preferred more tightly controlled resources.• But still made non-negligible use of Moderately controlled ones (38% of all
consultations).• Almost no use of completely Open resources.
Tight =
Carefully crafted and revised (linguists, terminologists, revisers). Ex: TERMIUM.
Moderate =
Comes from reputed organizations, but may not be as carefully crafted and revised. Ex: Gov of Canada web sites.
Open =
Could have been produced by anyone. Ex: the whole web.
p < 0.05
p < 0.001
Write Access
• Our subjects were predominantly consumers of resources, as opposed to contributors.• Many comments about lack of time to contribute.• But in most collaborative resources, only need a small percentage of
contributors.
Read-only =
Subject cannot write, or can only do so through an intermediary. Ex: TERMIUM
Read-Write =
Subject can write directly without an intermediary. Ex: subject’s own lexicon.
p < 0.001
Use of collaborative sources is not that common yet, but growing.
Collaborative resources go against the grain of some translator attitudes, but nothing that can’t be surmounted.
Need to address perception of quality and trustworthiness.
Cannot expect majority of translators to contribute.
Attitudes and workpractices with respect to collaborative translation
Flavours of Collaborative Translation
In increasing order of controversy:
Translation teamwareAllow multidisciplinary teams of translators, terminologists, customers,
domain experts to collaborate efficiently on a translation project.
Online market place for translatorsE-bay like platforms for connecting customers and translators with
minimal intervention by a middle man.
Translation crowdsourcingMechanical Turk style platform for distributing translation projects
across large crowds of mostly amateur translators.
Translation Teamware
Allow multidisciplinary teams of translators, terminologists, customers, domain experts to collaborate efficiently on a translation project.
• Relatively uncontroversial.• Many commercial translation workflow products are along those
lines, but follow a somewhat assembly-line model.• More resistance to wiki-like platforms that breakdown barriers and
open up horizontal communication channels– Ex: Customer seeing early drafts of translations, and commenting on
them.– Translators like to (need to?) stay in their own bubble.– Fear of undue interference by non-qualified staff.– But starting to see more and more case studies of this (ex: using
BaseCamp or wikis to coordinate translation teams)
Online Market Place for Translators
E-bay like platforms for connecting customers and translators with minimal intervention by a middle man.
– Ex: ProZ, Translated.net
Usually includes – automatic reputation management.– free, open resources for translators (ex: Kudoz, MyMemory).
Somewhat controversial:– Some freelancers perceive it as empowering (cut out the middle
man).– Others perceive it as an impersonal “Wallmart of translation”, i.e.
something that encourages , low-quality translation.
Translation Crowdsourcing
Mechanical Turk style platform for distributing translation projects across large crowds of mostly amateur translators.
This is REALLY controversial.• So far, only heard one professional translator say that this is a
good thing.
Disrupts the fair compensation equation AND exerts downward pressures on quality.One crowdsourcing vendor quotes average of $0.0008/word (vs $0.25-
0.30/word for your average professional translator).Translating out of context is known to be error-prone.Amateur translators tend to produce texts that read like translations.
Translation Crowdsourcing (2)
One hopes that CrowdSourcing technology will be used wisely and in a way that continues to leverage professional translator skills, for example:- Crowdsourcing used mostly for low-stake or user-generated content that
is currently not being translated at all.- Professionals continue to play a pivotal role, by revising translations
produced by the crowd and paying special attention to amateurs’ main weakness: native-sounding translation.
But we, researchers and developers, cannot guarantee that this is how things will unfold.
We need to be sensitive to those issues while we build the future of translation crowdsourcing.
Conclusions
Conclusions
• Professional translators are NOT technology averse, but they will resist technologies that disrupt the fair compensation equation, or exert downward pressures on quality.
• Crowdsourcing of large linguistic resources is compatible with the views of professional translators, although it is not yet part of their mainstream work practices.
• Also non-controversial, is the use of online collaboration to facilitate team coordination, or to create “fair” marketplaces for freelance translators.
• Crowd-sourcing of translation on the other hand is very controversial in translator circles, and we need to be sensitive to that issue in building and designing translation crowdsourcing environments.
Questions?
Thank you for your attention.For more details…
Alain Désilets
National Research Council of Canada
alain.desilets@nrc-cnrc.gc.ca