Translation quality assessment redefined

transcript

TRANSLATION QUALITY ASSESSMENT REDEFINEDfrom TQI to competences and suitability

Demid Tishin

All Correct Language Solutions

www.allcorrect.ru

What are they thinking about when they look at the target text?

Client:

Will it blend?* Let’s find a flaw…

*Just a joke. “Will it do”, I mean

Quality manager:

Will it blend? I wish the client said

HR / Vendor Manager:

What kind of work can I trust to this provider? What can I

not?How quickly can we train him?

Project Manager:

Return for improvement or correct by other

resources?

To answer these questions the target text needs

assessment

TRANSLATION ASSESSMENT: THE ARMORY

What assessment techniques do you know?

Subjective assessment (“good / bad”)

Comparing with the sourceaccording to a parameter checklist

Automated comparison with a reference translation (BLEU etc.)

Weighing errors and calculating TQI 9

SUBJECTIVE ASSESSMENT (“GOOD / BAD”)

Pro’s Con’s

Speed Results not repeatable

Results not reproducibleDifficult for client and service provider to arrive at the same opinion

Impossible to give detailed reasons

Tells nothing of provider’s abilities

COMPARING WITH THE SOURCEACCORDING TO A PARAMETER CHECKLIST

Pro's Con's

Some reasoning for assessment results

Results not reproducibleDifficult for client and service provider to arrive at the same opinion

Results not repeatable

Tells nothing of provider’s abilities 11

AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION

The more word sequences correlate between the target and the reference, the better the translation

BLEU (BiLingual Evaluation Understudy), ROUGE, NIST, METEOR etc.

An overview of BLEU: Tomedes Blog http://blog.tomedes.com/measuring-machine-translation-quality/

AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION

Pro's Con's

Speed Does not account for individual style

Limited scope (today limited to MT output)

Does not correlate to human assessment

A number of reference translations must be prepared before assessment (justified for batch assessment of different translations of the same source sample)

Tells nothing of provider’s abilities

How should acceptability threshold be defined? 14

WEIGHING ERRORS AND CALCULATING TQI

Who? Lionbridge, Aliquantum, Logrus, All Correct Language Solutions … and many others

Publicly available techniques:- SAE J2450- ATA Framework for Standard Error Marking- LISA QA Model 3.1

An overview of translation quality index techniques and guidelines to create your own:http://www.aliquantum.biz/downloads.htm 15

What components you will need: Error classifier Error weighing guidelines Translation assessment guidelines, which yield

repeatable and reproducible results Expert (competent and unambiguous) Assessment results form

TQI (Translation Quality Index) is the usual practical result of translation quality measurement

ATA Framework: TQI = EP * (250 / W) - BP

SAE J2450: TQI = EP / W

LISA QA Model: TQI = (1 - EP / W) * 100

where EP = total Error Points

W = number of words in sample

BP = Bonus Points for outstanding translation passages (ATA)

max. 3 points20

Results highly reproducible (SAE J2450)

Results highly repeatable (SAE J2450)

Detailed error classifier with explanations and examples (LISA QA Model)

Easy to use for quality feedback to providers

Convenient to grade providers according to their TQI for a specific project

TQI is a simple numeric index, which you can account in a database and use in your balanced scorecard, KPI’s etc. 21

Limited scope (SAE J2450)

Low reproducibility of results (ATA Framework)

A threshold of acceptable TQI is required (e.g. 94,5 etc.), while clients do not tolerate any explicitly stated imperfection

Assessment is time-consuming (5-20 minutes per sample provided that the expert has carefully studied the source.

Subjective or underdeveloped error weight assignment – a try at forecasting error consequences (LISA QA Model)

Tells very little of provider’s abilities 22

Underdeveloped Translation assessment guidelines, including but not limited to: - requirements to translation sample (size, presence of terminology etc.)- how to evaluate repeated typical (pattern) errors?- how to assess flaws in the target, which root in obvious flaws in the source?- how to evade several minor errors resulting in the same score as one major error?- how to handle obviously accidental errors that change the factual meaning? 23

TQI is valid only for:- a specific subject field (e.g. gas turbines, food production etc.)- a specific text type (Legal, Technical and Research, or Advertising and Journalism)

A slight change in any of the above (subject, text type) means that one cannot forecast the provider’s TQI based on former evaluations a new (tailored) assessment is required ungrounded expenses 24

None of the translation assessment methods answers the questions:

Will it blend?What kind of work can I trust

to this provider? What can I not?How quickly can we train him?Return for improvement or correct by other

resources?

Translation assessment techniques

need improvement!

Split all errors into 2 major groups:Factual = error in designation of objects and

phenomena, their logical relations, and degree of event probability / necessity

Connotative = errors in conveying emotional and stylistic information, non-compliance with rules, standards, checklists and guidelines etc.

IMPROVEMENT 1: TWO ERROR DIMENSIONS

That’s a restaurant

That’s a damn fctory

That’s a factory

(factual error)

(2 connotative errors, though no factual errors)

(source)

Each text element (word, phrase, sentence etc.) can contain:

1 connotative error

or1 factual error

or1 connotative and 1 factual error

simultaneously

An accidental error (e.g. an obvious typo), which leads to obscuring factual info, counts as two (e.g. language and factual).

You can at once give specific instructions to the provider (e.g. be more careful) and consider client’s interest (e.g. absence of factual distortions whatever the reason)

“To kill Edward fear not, good it is” / “To kill Edward fear, not good it is” (Isabella of France): an error in the comma critical factual distortion

Map each error in the classifier to the competences that are required to avoid it

IMPROVEMENT 2: COMPETENCES

Competence types:Competences of acquisitionCompetences of productionAuxiliary (general) competences

Competence levels:Unsatisfactory = provider cannot do the

corresponding workBasic = can workAdvanced = can revise and correct work

of others or train others

Competences of acquisitionSource language rulesSource Literary Source CulturalSubject matter

Competences of production:Target language rulesTarget literaryTarget culturalTarget mode of expression (= register,

functional style)

Auxiliary (general) competences:ResearchTechnicalGeneral Carefulness, Responsibility and

Self-organisationCommunication (relevant for translation

as a service, not the product)Information security (relevant for

translation as a service, not the product)

Client can formulate precise and objective requirements to the provider

Assessment immediately shows which competences stand to the required level and which don’t

IMPROVEMENT 3: WORKFLOW ROLES

Map each workflow role (e.g. translate, compile project glossary, revise language etc.) to a number of required competences

Example of a competence set: Self-organisation = basic Subject matter = basic Source language rules = basic Target language rules = basic

Role: Can translate

Vendor Manager / Project Manager quickly assigns workflow roles and schedules the project

saves time

IMPROVEMENT 4: ERROR ALLOWABILITY

In each case client indicates which error types are allowed and which are not. The expert puts down the client requirements in a list

One “not allowed” error in the sample

Text fails (client perspective)

IMPROVEMENT 4: ERROR ALLOWABILITY

Assessment stands to the real client needs (“pass / fail”)

IMPROVEMENT 5: PROVIDER TRAINABILITY

Single out 2 major error groups: Correcting the error requires minimum training /

instructions; the provider can find and correct all errors of the type in his work himself

Correcting the error requires prolonged training; the provider cannot find all his errors in the text

Errors that require minimum training: the original order of text sections is broken broken cross-references text omissions numbers / dates do not correspond to the source glossary / style guide violated non-compliance with reference sources inconsistent terminology non-compliance with regional formatting standards broken tags, line length obvious language errors etc.

Errors that require prolonged training:understanding the sourceconfusion of special notions

(subject competence) stylistic devices and expressive means

(literary competence)cultural phenomenaetc.

What is the percentage of errors requiring minimum training?

PM can instantly take a decision – return the product for further improvement or correct with other resources

saves time51

If all errors influencing the competence are easy to correct, the competence is assessed in two ways (at once):

Current state (“as is”)Potential state (after a short training)

Provider has to work in normal conditions (enough time, work instructions)

The sample should be restricted to one main subject field according to the subject classifier

The source text should be meaningful and coherent

It is important to differentiate between errors and preferential choices

To assess a sample the expert has to possess all competences on "advanced" level.

As it is difficult to find such experts in reality, several people can be assigned to assess one sample (e.g. one assesses terminology, another assesses all the other aspects)

Quality predictions for rush jobs cannot be based on normal competence assessment (as the rush quality output is normally lower)

EXAMPLE

CONCLUSION

The new assessment model replies to all the questions:

Will it blend? – pass / fail What kind of work can I trust to this provider?

What can I not?– competences and workflow roles How quickly can I train the provider? – potential

competences Return for improvement or correct by other

resources? – percentage of errors requiring minimum training

BENEFITS Provider and client speak the same “language”

(error types and competences) less debates Saves time when testing providers Simplifies planning of minimum and sufficient

workflow, optimizes resources Allows to avoid extra text processing stages

when not necessary (extra stages avoided) better turnaround (extra stages avoided) more flexible budgets

higher rates provider loyalty and good image for the company

Detailed feedback and training provider loyalty

THE FUTURE OF THE TECHNIQUE

Adjustment and testingDedicated software toolIntegration with QA tools

QUALITY MANAGEMENT PROCESS

Thank you!

Questions?

dtishin@allcorrect.ru 63

Translation quality assessment redefined

Education