+ All Categories
Home > Documents > Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation

Date post: 30-Oct-2014
Category:
Upload: ehsaanalipour
View: 83 times
Download: 6 times
Share this document with a friend
Description:
Translation Studies Research Paper
Popular Tags:
30
Further evidence for a functionalist approach to translation quality evaluation* Sonia Colina e University of Arizona Colina (2008) proposes a componential-functionalist approach to translation quality evaluation and reports on the results of a pilot test of a tool designed ac- cording to that approach. e results show good inter-rater reliability and justify further testing. e current article presents an experiment designed to test the approach and tool. Data was collected during two rounds of testing. A total of 30 raters, consisting of Spanish, Chinese and Russian translators and teachers, were asked to rate 4–5 translated texts (depending on the language). Results show that the tool exhibits good inter-rater reliability for all language groups and texts except Russian and suggest that the low reliability of the Russian raters’ scores is unrelated to the tool itself. e findings are in line with those of Colina (2008). Keywords: quality, assessment, evaluation, rating, componential, functionalism, errors 0. Introduction Recent US federal mandates (e.g. White House Executive Order #13166), 1 requir- ing health care providers who are recipients of federal funds to provide language translation and interpretation for patients with limited English proficiency (LEP), have brought the long-standing issue of translation quality to a wider audience of health care professionals (e.g. managers, decision makers, industry stakeholders, private foundations), who generally feel unprepared to address the topic. A strik- ing example of how challenging quality evaluation can be for health care organiza- tions is illustrated by the experience of Hablamos Juntos, an initiative funded by the Robert Wood Johnson Foundation to develop practical solutions to language barriers to health care. Several healthcare providers (including hospitals) working with the program identified what they believed were “the best” translations available. Eighty-seven Target 21:2 (2009), 235–264. doi 10.1075/target.21.2.02col issn 0924–1884 / e-issn 1569–9986 © John Benjamins Publishing Company
Transcript
Page 1: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation

Sonia ColinaThe University of Arizona

Colina (2008) proposes a componential-functionalist approach to translation quality evaluation and reports on the results of a pilot test of a tool designed ac-cording to that approach The results show good inter-rater reliability and justify further testing The current article presents an experiment designed to test the approach and tool Data was collected during two rounds of testing A total of 30 raters consisting of Spanish Chinese and Russian translators and teachers were asked to rate 4ndash5 translated texts (depending on the language) Results show that the tool exhibits good inter-rater reliability for all language groups and texts except Russian and suggest that the low reliability of the Russian ratersrsquo scores is unrelated to the tool itself The findings are in line with those of Colina (2008)

Keywords quality assessment evaluation rating componential functionalism errors

0 Introduction

Recent US federal mandates (eg White House Executive Order 13166)1 requir-ing health care providers who are recipients of federal funds to provide language translation and interpretation for patients with limited English proficiency (LEP) have brought the long-standing issue of translation quality to a wider audience of health care professionals (eg managers decision makers industry stakeholders private foundations) who generally feel unprepared to address the topic A strik-ing example of how challenging quality evaluation can be for health care organiza-tions is illustrated by the experience of Hablamos Juntos an initiative funded by the Robert Wood Johnson Foundation to develop practical solutions to language barriers to health care

Several healthcare providers (including hospitals) working with the program identified what they believed were ldquothe bestrdquo translations available Eighty-seven

Target 212 (2009) 235ndash264 doi 101075target21202colissn 0924ndash1884 e-issn 1569ndash9986 copy John Benjamins Publishing Company

236 Sonia Colina

documents rated as highly satisfactory and recommended for replication were collected from the providers Examination of these health education texts by doc-torate-level Spanish language specialists resulted in quality being identified as a problem Many of these texts were cumbersome to read to the point that readers required the English originals to decipher the intended meanings of some trans-lations It became clear that these texts were potentially hampering health care quality and outcomes by not providing needed access to intended health care in-formation for patients with limited English proficiency Furthermore health care administrators overseeing the translation processes that produced these texts had not identified quality as a problem and needed assistance assessing the quality of non-English written materials It was this context that prompted the launch of the Translation Quality Assessment (TQA) project funded as one of various HJ initia-tives to improve communication between health providers and patients with lim-ited English proficiency The TQA project aims to design and test a research-based prototype tool that could be used by health care organizations to assess the quality of translated materials being able to identify a wide range of quality Colina (2008) describes the initial version of the tool and the first phase of testing The results of a pilot experiment reported also in Colina (2008) reveal good inter-rater reli-ability and provide justification for further testing The current article presents a second experiment designed to test the approach and tool

1 Translation quality revisited

Translation quality evaluation is probably one of the most controversial intensely debated topics in translation scholarship and practice Yet progress in this area does not seem to correlate with the intensity of the debate One may wonder whether the situation is perhaps partly related to the diverse nature of the defi-nitions of translation In a field such as translation studies filled with unstated often culturally-dependent assumptions about the role of translation and transla-tors equivalence and literalness translation norms and translation standards it is not surprising that quality and evaluation have remained elusive to definition or standards Current reviews of the literature offer support for this hypothesis (Co-lina 2008 House 2001 Lauscher 2000) as they reveal a multiplicity of views and priorities in the area of translation quality In one recent overview Colina (2008) classifies the various approaches into two major groups according to whether their orientation is experiential or theoretical parts of that overview are reproduced here for ease of reference (see further Colina 2008)

Further evidence for a functionalist approach to translation quality evaluation 237

11 Experiential approaches

Many methods of translation quality assessment fall within this category They tend to be ad hoc anecdotal marking scales developed for the use of a particular professional organization or industry eg the ATA certification exam the SAE J2450 Translation Quality Metric for the automotive industry the LISA QA tool for localization2 While the scales are often adequate for the particular purposes of the organization that created them they suffer from limited transferability pre-cisely due to the absence of theoretical andor research foundations that would permit their transfer to other environments For the same reason it is difficult to assess the replicability and inter-rater reliability of these approaches

12 Theoretical approaches

Recent theoretical research-based approaches tend to focus on the user of a trans-lation andor the text They have also been classified as equivalence-based or func-tionalist (Lauscher 2000) These approaches arise out of a theoretical framework or stated assumptions about the nature of translation however they tend to cover only partial aspects of quality and they are often difficult to apply in professional or teaching contexts

121 Reader-response approachesReader-response approaches evaluate the quality of a translation by assessing whether readers of the translation respond to it as readers of the source would re-spond to the original (Nida 1964 Carroll 1966 Nida and Taber 1969) The reader-response approach must be credited with recognizing the role of the audience in translation more specifically of translation effects on the reader as a measure of translation quality This is particularly noteworthy in an era when the dominant notion of lsquotextrsquo was that of a static object on a page

Yet the reader-response method is also problematic because in addition to the difficulties inherent to the process of measuring reader response the response of a reader may not be equally important for all texts especially for those that are not reader-oriented (eg legal texts) The implication is that reader response will not be equally informative for all types of translation In addition this method ad-dresses only one aspect of a translated text (ie equivalence of effect on the reader) ignoring others such as the purpose of the translation which may justify or even require a slightly different response from the readers of the translation One also wonders if it is in fact possible to determine whether two responses are equivalent as even monolingual texts can trigger non-equivalent reactions from slightly dif-ferent groups of readers Since in most cases the readership of a translated text is

238 Sonia Colina

different than that envisioned by the writer of the original3 one can imagine the difficulties entailed by equating quality with equivalence of response Finally as with many other theoretical approaches reader-response testing is time-consum-ing and difficult to apply to actual translations At a minimum careful selection of readers is necessary to make sure that they belong to the intended audience for the translation

122 Textual and pragmatic approachesTextual and pragmatic approaches have made a significant contribution to the field of translation evaluation by shifting the focus from counting errors at the word or sentence level to evaluating texts and translation goals giving the reader and communication a much more prominent role Yet despite these advances none of these approaches can be said to have been widely adopted by either profes-sionals or scholars

Some models have been criticized because they focus too much on the source text (Reiss 1971) or on the target text (Skopos) (Reiss and Vermeer 1984 Nord 1997) Reiss argues that the text type and function of the source text is the most important factor in translation and quality should be assessed with respect to it For Skopos Theory it is the text type and function of the translation that is of para-mount importance in determining the quality of the translation

Housersquos (1997 2001) functional pragmatic model relies on an analysis of the linguistic-situational features of the source and target texts a comparison of the two texts and the resulting assessment of their match The basic measure of qual-ity is that the textual profile and function of the translation match those of the original the goal being functional equivalence between the original and the trans-lation One objection that has been raised against Housersquos functional model is its dependence on the notion of equivalence often a vague and controversial term in translation studies (Houmlnig 1997) This is a problem because translations sometimes are commissioned for a somewhat different function than that of the original in addition a different audience and time may require a slightly different function than that of the source text (see Houmlnig 1997 for more on the problematic notion of equivalence) These scenarios are not contemplated by equivalence-based theo-ries of translation Furthermore one can argue that what qualifies as equivalent is as variegated as the notion of quality itself Other equivalence-based models of evaluation are Gerzymisch-Arbogast (2001) Neubert (1985) and Van den Broeck (1985) In sum the reliance on an a priori notion of equivalence is problematic and limiting in descriptive as well as explanatory value

An additional objection against textual and pragmatic approaches is that they are not precise about how evaluation is to proceed after the analysis of the source or the target text is complete or after the function of the translation has been established

Further evidence for a functionalist approach to translation quality evaluation 239

as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)

Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)

13 The functional-componential approach (Colina 2008)

Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously

240 Sonia Colina

As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text

As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)

The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use

Further evidence for a functionalist approach to translation quality evaluation 241

the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)

2 Second phase of TQA testing Methods and Results

21 Methods

One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims

I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions

Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate

the texts Question 3 How consistently do raters in the second session (Reliability) rate

the texts Question 4 How consistently do raters rate each component of the tool Are

there some test components where there is higher rater reliability

II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)

Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation

242 Sonia Colina

211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing

As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages

Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)

Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors

Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application

The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar

Further evidence for a functionalist approach to translation quality evaluation 243

As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows

Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)

Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese

212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 2: Further evidence for a functionalist approach to translation quality evaluation

236 Sonia Colina

documents rated as highly satisfactory and recommended for replication were collected from the providers Examination of these health education texts by doc-torate-level Spanish language specialists resulted in quality being identified as a problem Many of these texts were cumbersome to read to the point that readers required the English originals to decipher the intended meanings of some trans-lations It became clear that these texts were potentially hampering health care quality and outcomes by not providing needed access to intended health care in-formation for patients with limited English proficiency Furthermore health care administrators overseeing the translation processes that produced these texts had not identified quality as a problem and needed assistance assessing the quality of non-English written materials It was this context that prompted the launch of the Translation Quality Assessment (TQA) project funded as one of various HJ initia-tives to improve communication between health providers and patients with lim-ited English proficiency The TQA project aims to design and test a research-based prototype tool that could be used by health care organizations to assess the quality of translated materials being able to identify a wide range of quality Colina (2008) describes the initial version of the tool and the first phase of testing The results of a pilot experiment reported also in Colina (2008) reveal good inter-rater reli-ability and provide justification for further testing The current article presents a second experiment designed to test the approach and tool

1 Translation quality revisited

Translation quality evaluation is probably one of the most controversial intensely debated topics in translation scholarship and practice Yet progress in this area does not seem to correlate with the intensity of the debate One may wonder whether the situation is perhaps partly related to the diverse nature of the defi-nitions of translation In a field such as translation studies filled with unstated often culturally-dependent assumptions about the role of translation and transla-tors equivalence and literalness translation norms and translation standards it is not surprising that quality and evaluation have remained elusive to definition or standards Current reviews of the literature offer support for this hypothesis (Co-lina 2008 House 2001 Lauscher 2000) as they reveal a multiplicity of views and priorities in the area of translation quality In one recent overview Colina (2008) classifies the various approaches into two major groups according to whether their orientation is experiential or theoretical parts of that overview are reproduced here for ease of reference (see further Colina 2008)

Further evidence for a functionalist approach to translation quality evaluation 237

11 Experiential approaches

Many methods of translation quality assessment fall within this category They tend to be ad hoc anecdotal marking scales developed for the use of a particular professional organization or industry eg the ATA certification exam the SAE J2450 Translation Quality Metric for the automotive industry the LISA QA tool for localization2 While the scales are often adequate for the particular purposes of the organization that created them they suffer from limited transferability pre-cisely due to the absence of theoretical andor research foundations that would permit their transfer to other environments For the same reason it is difficult to assess the replicability and inter-rater reliability of these approaches

12 Theoretical approaches

Recent theoretical research-based approaches tend to focus on the user of a trans-lation andor the text They have also been classified as equivalence-based or func-tionalist (Lauscher 2000) These approaches arise out of a theoretical framework or stated assumptions about the nature of translation however they tend to cover only partial aspects of quality and they are often difficult to apply in professional or teaching contexts

121 Reader-response approachesReader-response approaches evaluate the quality of a translation by assessing whether readers of the translation respond to it as readers of the source would re-spond to the original (Nida 1964 Carroll 1966 Nida and Taber 1969) The reader-response approach must be credited with recognizing the role of the audience in translation more specifically of translation effects on the reader as a measure of translation quality This is particularly noteworthy in an era when the dominant notion of lsquotextrsquo was that of a static object on a page

Yet the reader-response method is also problematic because in addition to the difficulties inherent to the process of measuring reader response the response of a reader may not be equally important for all texts especially for those that are not reader-oriented (eg legal texts) The implication is that reader response will not be equally informative for all types of translation In addition this method ad-dresses only one aspect of a translated text (ie equivalence of effect on the reader) ignoring others such as the purpose of the translation which may justify or even require a slightly different response from the readers of the translation One also wonders if it is in fact possible to determine whether two responses are equivalent as even monolingual texts can trigger non-equivalent reactions from slightly dif-ferent groups of readers Since in most cases the readership of a translated text is

238 Sonia Colina

different than that envisioned by the writer of the original3 one can imagine the difficulties entailed by equating quality with equivalence of response Finally as with many other theoretical approaches reader-response testing is time-consum-ing and difficult to apply to actual translations At a minimum careful selection of readers is necessary to make sure that they belong to the intended audience for the translation

122 Textual and pragmatic approachesTextual and pragmatic approaches have made a significant contribution to the field of translation evaluation by shifting the focus from counting errors at the word or sentence level to evaluating texts and translation goals giving the reader and communication a much more prominent role Yet despite these advances none of these approaches can be said to have been widely adopted by either profes-sionals or scholars

Some models have been criticized because they focus too much on the source text (Reiss 1971) or on the target text (Skopos) (Reiss and Vermeer 1984 Nord 1997) Reiss argues that the text type and function of the source text is the most important factor in translation and quality should be assessed with respect to it For Skopos Theory it is the text type and function of the translation that is of para-mount importance in determining the quality of the translation

Housersquos (1997 2001) functional pragmatic model relies on an analysis of the linguistic-situational features of the source and target texts a comparison of the two texts and the resulting assessment of their match The basic measure of qual-ity is that the textual profile and function of the translation match those of the original the goal being functional equivalence between the original and the trans-lation One objection that has been raised against Housersquos functional model is its dependence on the notion of equivalence often a vague and controversial term in translation studies (Houmlnig 1997) This is a problem because translations sometimes are commissioned for a somewhat different function than that of the original in addition a different audience and time may require a slightly different function than that of the source text (see Houmlnig 1997 for more on the problematic notion of equivalence) These scenarios are not contemplated by equivalence-based theo-ries of translation Furthermore one can argue that what qualifies as equivalent is as variegated as the notion of quality itself Other equivalence-based models of evaluation are Gerzymisch-Arbogast (2001) Neubert (1985) and Van den Broeck (1985) In sum the reliance on an a priori notion of equivalence is problematic and limiting in descriptive as well as explanatory value

An additional objection against textual and pragmatic approaches is that they are not precise about how evaluation is to proceed after the analysis of the source or the target text is complete or after the function of the translation has been established

Further evidence for a functionalist approach to translation quality evaluation 239

as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)

Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)

13 The functional-componential approach (Colina 2008)

Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously

240 Sonia Colina

As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text

As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)

The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use

Further evidence for a functionalist approach to translation quality evaluation 241

the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)

2 Second phase of TQA testing Methods and Results

21 Methods

One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims

I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions

Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate

the texts Question 3 How consistently do raters in the second session (Reliability) rate

the texts Question 4 How consistently do raters rate each component of the tool Are

there some test components where there is higher rater reliability

II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)

Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation

242 Sonia Colina

211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing

As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages

Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)

Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors

Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application

The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar

Further evidence for a functionalist approach to translation quality evaluation 243

As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows

Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)

Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese

212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 3: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 237

11 Experiential approaches

Many methods of translation quality assessment fall within this category They tend to be ad hoc anecdotal marking scales developed for the use of a particular professional organization or industry eg the ATA certification exam the SAE J2450 Translation Quality Metric for the automotive industry the LISA QA tool for localization2 While the scales are often adequate for the particular purposes of the organization that created them they suffer from limited transferability pre-cisely due to the absence of theoretical andor research foundations that would permit their transfer to other environments For the same reason it is difficult to assess the replicability and inter-rater reliability of these approaches

12 Theoretical approaches

Recent theoretical research-based approaches tend to focus on the user of a trans-lation andor the text They have also been classified as equivalence-based or func-tionalist (Lauscher 2000) These approaches arise out of a theoretical framework or stated assumptions about the nature of translation however they tend to cover only partial aspects of quality and they are often difficult to apply in professional or teaching contexts

121 Reader-response approachesReader-response approaches evaluate the quality of a translation by assessing whether readers of the translation respond to it as readers of the source would re-spond to the original (Nida 1964 Carroll 1966 Nida and Taber 1969) The reader-response approach must be credited with recognizing the role of the audience in translation more specifically of translation effects on the reader as a measure of translation quality This is particularly noteworthy in an era when the dominant notion of lsquotextrsquo was that of a static object on a page

Yet the reader-response method is also problematic because in addition to the difficulties inherent to the process of measuring reader response the response of a reader may not be equally important for all texts especially for those that are not reader-oriented (eg legal texts) The implication is that reader response will not be equally informative for all types of translation In addition this method ad-dresses only one aspect of a translated text (ie equivalence of effect on the reader) ignoring others such as the purpose of the translation which may justify or even require a slightly different response from the readers of the translation One also wonders if it is in fact possible to determine whether two responses are equivalent as even monolingual texts can trigger non-equivalent reactions from slightly dif-ferent groups of readers Since in most cases the readership of a translated text is

238 Sonia Colina

different than that envisioned by the writer of the original3 one can imagine the difficulties entailed by equating quality with equivalence of response Finally as with many other theoretical approaches reader-response testing is time-consum-ing and difficult to apply to actual translations At a minimum careful selection of readers is necessary to make sure that they belong to the intended audience for the translation

122 Textual and pragmatic approachesTextual and pragmatic approaches have made a significant contribution to the field of translation evaluation by shifting the focus from counting errors at the word or sentence level to evaluating texts and translation goals giving the reader and communication a much more prominent role Yet despite these advances none of these approaches can be said to have been widely adopted by either profes-sionals or scholars

Some models have been criticized because they focus too much on the source text (Reiss 1971) or on the target text (Skopos) (Reiss and Vermeer 1984 Nord 1997) Reiss argues that the text type and function of the source text is the most important factor in translation and quality should be assessed with respect to it For Skopos Theory it is the text type and function of the translation that is of para-mount importance in determining the quality of the translation

Housersquos (1997 2001) functional pragmatic model relies on an analysis of the linguistic-situational features of the source and target texts a comparison of the two texts and the resulting assessment of their match The basic measure of qual-ity is that the textual profile and function of the translation match those of the original the goal being functional equivalence between the original and the trans-lation One objection that has been raised against Housersquos functional model is its dependence on the notion of equivalence often a vague and controversial term in translation studies (Houmlnig 1997) This is a problem because translations sometimes are commissioned for a somewhat different function than that of the original in addition a different audience and time may require a slightly different function than that of the source text (see Houmlnig 1997 for more on the problematic notion of equivalence) These scenarios are not contemplated by equivalence-based theo-ries of translation Furthermore one can argue that what qualifies as equivalent is as variegated as the notion of quality itself Other equivalence-based models of evaluation are Gerzymisch-Arbogast (2001) Neubert (1985) and Van den Broeck (1985) In sum the reliance on an a priori notion of equivalence is problematic and limiting in descriptive as well as explanatory value

An additional objection against textual and pragmatic approaches is that they are not precise about how evaluation is to proceed after the analysis of the source or the target text is complete or after the function of the translation has been established

Further evidence for a functionalist approach to translation quality evaluation 239

as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)

Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)

13 The functional-componential approach (Colina 2008)

Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously

240 Sonia Colina

As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text

As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)

The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use

Further evidence for a functionalist approach to translation quality evaluation 241

the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)

2 Second phase of TQA testing Methods and Results

21 Methods

One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims

I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions

Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate

the texts Question 3 How consistently do raters in the second session (Reliability) rate

the texts Question 4 How consistently do raters rate each component of the tool Are

there some test components where there is higher rater reliability

II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)

Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation

242 Sonia Colina

211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing

As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages

Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)

Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors

Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application

The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar

Further evidence for a functionalist approach to translation quality evaluation 243

As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows

Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)

Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese

212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 4: Further evidence for a functionalist approach to translation quality evaluation

238 Sonia Colina

different than that envisioned by the writer of the original3 one can imagine the difficulties entailed by equating quality with equivalence of response Finally as with many other theoretical approaches reader-response testing is time-consum-ing and difficult to apply to actual translations At a minimum careful selection of readers is necessary to make sure that they belong to the intended audience for the translation

122 Textual and pragmatic approachesTextual and pragmatic approaches have made a significant contribution to the field of translation evaluation by shifting the focus from counting errors at the word or sentence level to evaluating texts and translation goals giving the reader and communication a much more prominent role Yet despite these advances none of these approaches can be said to have been widely adopted by either profes-sionals or scholars

Some models have been criticized because they focus too much on the source text (Reiss 1971) or on the target text (Skopos) (Reiss and Vermeer 1984 Nord 1997) Reiss argues that the text type and function of the source text is the most important factor in translation and quality should be assessed with respect to it For Skopos Theory it is the text type and function of the translation that is of para-mount importance in determining the quality of the translation

Housersquos (1997 2001) functional pragmatic model relies on an analysis of the linguistic-situational features of the source and target texts a comparison of the two texts and the resulting assessment of their match The basic measure of qual-ity is that the textual profile and function of the translation match those of the original the goal being functional equivalence between the original and the trans-lation One objection that has been raised against Housersquos functional model is its dependence on the notion of equivalence often a vague and controversial term in translation studies (Houmlnig 1997) This is a problem because translations sometimes are commissioned for a somewhat different function than that of the original in addition a different audience and time may require a slightly different function than that of the source text (see Houmlnig 1997 for more on the problematic notion of equivalence) These scenarios are not contemplated by equivalence-based theo-ries of translation Furthermore one can argue that what qualifies as equivalent is as variegated as the notion of quality itself Other equivalence-based models of evaluation are Gerzymisch-Arbogast (2001) Neubert (1985) and Van den Broeck (1985) In sum the reliance on an a priori notion of equivalence is problematic and limiting in descriptive as well as explanatory value

An additional objection against textual and pragmatic approaches is that they are not precise about how evaluation is to proceed after the analysis of the source or the target text is complete or after the function of the translation has been established

Further evidence for a functionalist approach to translation quality evaluation 239

as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)

Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)

13 The functional-componential approach (Colina 2008)

Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously

240 Sonia Colina

As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text

As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)

The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use

Further evidence for a functionalist approach to translation quality evaluation 241

the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)

2 Second phase of TQA testing Methods and Results

21 Methods

One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims

I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions

Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate

the texts Question 3 How consistently do raters in the second session (Reliability) rate

the texts Question 4 How consistently do raters rate each component of the tool Are

there some test components where there is higher rater reliability

II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)

Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation

242 Sonia Colina

211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing

As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages

Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)

Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors

Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application

The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar

Further evidence for a functionalist approach to translation quality evaluation 243

As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows

Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)

Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese

212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 5: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 239

as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)

Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)

13 The functional-componential approach (Colina 2008)

Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously

240 Sonia Colina

As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text

As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)

The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use

Further evidence for a functionalist approach to translation quality evaluation 241

the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)

2 Second phase of TQA testing Methods and Results

21 Methods

One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims

I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions

Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate

the texts Question 3 How consistently do raters in the second session (Reliability) rate

the texts Question 4 How consistently do raters rate each component of the tool Are

there some test components where there is higher rater reliability

II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)

Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation

242 Sonia Colina

211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing

As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages

Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)

Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors

Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application

The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar

Further evidence for a functionalist approach to translation quality evaluation 243

As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows

Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)

Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese

212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 6: Further evidence for a functionalist approach to translation quality evaluation

240 Sonia Colina

As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text

As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)

The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use

Further evidence for a functionalist approach to translation quality evaluation 241

the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)

2 Second phase of TQA testing Methods and Results

21 Methods

One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims

I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions

Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate

the texts Question 3 How consistently do raters in the second session (Reliability) rate

the texts Question 4 How consistently do raters rate each component of the tool Are

there some test components where there is higher rater reliability

II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)

Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation

242 Sonia Colina

211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing

As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages

Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)

Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors

Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application

The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar

Further evidence for a functionalist approach to translation quality evaluation 243

As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows

Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)

Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese

212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 7: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 241

the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)

2 Second phase of TQA testing Methods and Results

21 Methods

One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims

I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions

Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate

the texts Question 3 How consistently do raters in the second session (Reliability) rate

the texts Question 4 How consistently do raters rate each component of the tool Are

there some test components where there is higher rater reliability

II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)

Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation

242 Sonia Colina

211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing

As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages

Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)

Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors

Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application

The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar

Further evidence for a functionalist approach to translation quality evaluation 243

As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows

Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)

Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese

212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 8: Further evidence for a functionalist approach to translation quality evaluation

242 Sonia Colina

211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing

As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages

Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)

Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors

Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application

The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar

Further evidence for a functionalist approach to translation quality evaluation 243

As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows

Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)

Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese

212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 9: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 243

As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows

Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)

Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese

212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 10: Further evidence for a functionalist approach to translation quality evaluation

244 Sonia Colina

213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors

214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system

Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5

22 Results

The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently

Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 11: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 245

200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian

Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and

Table 1 Average score of each text and standard deviation

Text of raters Average Score Standard Deviation

Spanish

210 11 918 81

214 11 895 113

215 11 868 150

228 11 486 192

235 11 564 185

Avg 1442

Chinese

410 10 880 103

413 10 630 210

415 10 960 57

418 10 760 212

Avg 1455

Russian

312 9 594 161

314 9 828 156

315 9 756 221

316 9 678 290

Avg 207

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 12: Further evidence for a functionalist approach to translation quality evaluation

246 Sonia Colina

0

20

40

60

80

100

210

214

215

228

235

410

413

415

418

312

314

315

316

Text number

Average ScoreStandard Deviation

Figure 1a Average score and standard deviation per text

0

5

10

15

20

25

Spanish Chinese Russian

Standard Deviation(Avg)

Figure 1b Average standard deviations per language

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 13: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 247

components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)

Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)

Table 2 Average scores and standard deviations for four components per text and per language

TL FTA MEAN TERM

Text Raters Mean SD Mean SD Mean SD Mean SD

Spanish

210 11 277 26 236 23 227 26 177 34

214 11 273 47 209 70 232 25 182 34

215 11 286 23 223 47 182 68 177 34

228 11 150 77 114 60 109 63 114 45

235 11 159 83 123 65 136 64 145 47

Avg 512 53 492 388

Chinese

410 10 270 48 220 48 210 46 180 26

413 10 180 95 165 58 140 52 145 37

415 10 285 24 250 00 235 24 190 21

418 10 225 68 210 46 160 77 165 41

Avg 5875 38 4975 3125

Russian

312 9 183 71 150 61 133 66 128 44

314 9 256 63 217 50 194 39 161 42

315 9 233 94 183 79 178 44 161 42

316 9 200 103 167 79 172 71 139 65

8275 6725 55 4825

AvgSD (all lgs) 63 53 51 39

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 14: Further evidence for a functionalist approach to translation quality evaluation

248 Sonia Colina

This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters

Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)

Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low

Average SD per tool component

0

1

2

3

4

5

6

7

8

9

TL FTA MEAN TERM

SpanishChineseRussianAll languages

Figure 2 Average standard deviations per tool component and per language

Table 3 Reliability coefficients for benchmark ratings

Reliability coefficient

Spanish 953

Chinese 973

Russian 128

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 15: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 249

Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability

The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)

Table 6 Reliability coefficients for the four components of the tool (all raters per language group)

TL FTA MEAN TERM

Spanish 952 929 926 848

Chinese 844 844 864 783

Russian 367 479 492 292

In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself

Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators

Table 4 Reliability coefficients for Reliability Testing

Reliability coefficient

Spanish 934

Chinese 780

Russian 118

Table 5 Inter-rater reliability Benchmark and Reliability Testing

Benchmark reliability coefficient

Reliability coefficient(for Reliability Testing)

Spanish 953 934

Chinese 973 780

Russian 128 118

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 16: Further evidence for a functionalist approach to translation quality evaluation

250 Sonia Colina

Table 7a Average scores and standard deviations for consultants and translators

Score Time

text Mean SD Mean SD

210 933 75 758 594

214 933 121 942 1014

215 850 179 363 183

228 467 207 375 223

235 467 186 495 389

410 914 75 460 221

413 629 210 407 137

415 964 48 261 154

418 693 221 524 222

312 525 151 267 26

314 883 103 225 42

315 742 263 287 78

316 633 327 258 66

Table 7b Average scores and standard deviations for teachers

Score Time

text Mean SD Mean SD

210 900 94 636 397

214 850 94 670 418

215 890 124 360 305

228 510 195 380 317

235 680 104 576 402

410 800 132 610 277

413 633 257 710 246

415 950 87 410 115

418 917 58 440 66

312 733 58 550 567

314 717 208 477 627

315 783 144 377 455

316 767 225 467 635

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 17: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 251

The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian

Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)

As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)

Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)

In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly

0

10

20

30

40

50

60

70

80

90

100

210 214 215 228 235Mean scores for Spanish raters

TranslatorsTeachers

Figure 3 Mean scores for Spanish raters

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 18: Further evidence for a functionalist approach to translation quality evaluation

252 Sonia Colina

0

10

20

30

40

50

60

70

80

210 214 215 228 235

Time for Spanish raters

TranslatorsTeachers

Figure 4 Time for Spanish raters

0

20

40

60

80

100

120

410 413 415 418

Mean Score for Chinese Raters

TranslatorsTeachers

Figure 5 Mean scores for Chinese raters

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 19: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 253

0

10

20

30

40

50

60

70

80

410 413 415 418

Time for Chinese Raters

TranslatorsTeachers

Figure 6 Time for Chinese raters

0

10

20

30

40

50

60

70

80

90

100

312 314 315 316

Mean scores for Russian Raters

TranslatorsTeachers

Figure 7 Mean scores for Russian raters

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 20: Further evidence for a functionalist approach to translation quality evaluation

254 Sonia Colina

between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected

0

10

20

30

40

50

60

312 314 315 316

Time for Russian Raters

TranslatorsTeachers

Figure 8 Time for Russian raters

Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters

SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS

minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938

82 Chinese raters

CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL

minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926

83 Russian raters

RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW

minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 21: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 255

3 Conclusions

As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view

In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types

Notes

The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 22: Further evidence for a functionalist approach to translation quality evaluation

256 Sonia Colina

and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics

1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings

2 wwwsaeorg wwwlisaorgproductsqamodel

3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages

4 Note the reference to reader response within a functionalist framework

5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)

References

Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation

Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical

Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw

HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist

Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462

227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York

RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation

Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social

Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo

The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 23: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 257

Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome

PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126

Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie

Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its

Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62

Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344

Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press

Reacutesumeacute

Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)

Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 24: Further evidence for a functionalist approach to translation quality evaluation

258 Sonia Colina

Appendix 1 Tool

Benchmark Rating Session

T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s

Translation Quality Assessment ndash Cover Sheet For Health Education Materials

PART I To be completed by Requester

Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text

Requester

TitleDepartment Delivery Date

T R A N S L A T I O N B R I E F

Source Language Target Language

Spanish Russian Chinese

Text Type

Text Title

Target Audience

Purpose of Document

P R I O R I T Y O F Q U A L I T Y C R I T E R I A

____ Target Language

____ Functional and Textual Adequacy

____ Non-Specialized Content (Meaning)

Rank EACH from 1 to 4

(1 being top priority)

____ Specialized Content and Terminology

PART II To be completed by TQA Rater

Rater (Name) Date Completed

Contact Information Date Received

Total Score Total Rating Time

A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N

Publish andor use as is

Minor edits needed before publishing

Major revision needed before publishing

Redo translation

(To be completed after evaluating translated text)

Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)

NotesRecommended Edits

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 25: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 259

- 2 -

RATING INSTRUCTIONS

1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only

2 Check the description that best fits the text given in each one of the categories

3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories

4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection

1 TARGET LANGUAGE

Category Number

Description Check one

box

1a

The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The

structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is

extremely difficult to read bordering on being incomprehensible

1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the

source language shows up in the translation and affects its readability The text is hard to comprehend

1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text

1d

The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward

expressions are minimal if existent at all

ExamplesComments

2 FUNCTIONAL AND TEXTUAL ADEQUACY

Category

Number Description

Check one

box

2a Disregard for the goals purpose function and audience of the text The text was translated without considering

textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions

2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience

cultural considerations etc) Repair requires effort

2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is

not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits

2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and

characteristics of the audience Minor or no edits needed

ExamplesComments

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 26: Further evidence for a functionalist approach to translation quality evaluation

260 Sonia Colina

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

- 3 -

3 NON-SPECIALIZED CONTENT-MEANING

Category Number

Description Check one

box

3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective

comprehension of the original text

3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions

3c Minor alterations in meaning additions or omissions

3d The translation accurately reflects the content contained in the original insofar as it is required by the

instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately

ExamplesComments

4 SPECIALIZED CONTENT AND TERMINOLOGY

Category

Number Description

Check one

box

4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content

4b Seriousfrequent mistakes involving terminology andor specialized content

4c A few terminological errors but the specialized content is not seriously affected

4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific

to the subject

ExamplesComments

TOTAL SCORE

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 27: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 261

- 4 -

S C O R I N G W O R K S H E E T

Component Target Language Component Functional and Textual Adequacy

Category Value Score Category Value Score

1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30

2d 25

Component Non-Specialized Content Component Specialized Content and

Terminology

Category Value Score Category Value Score

3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25

4d 20

Tally Sheet

Component Category

Rating Score Value

Target Language

Functional and Textual Adequacy

Non-Specialized Content

Specialized Content and Terminology

Total Score

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 28: Further evidence for a functionalist approach to translation quality evaluation

262 Sonia Colina

Appendix 2 Text sample

bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 29: Further evidence for a functionalist approach to translation quality evaluation

Further evidence for a functionalist approach to translation quality evaluation 263

bull bull

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu

Page 30: Further evidence for a functionalist approach to translation quality evaluation

264 Sonia Colina

Authorrsquos address

Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America

scolinaemailarizonaedu


Recommended