+ All Categories
Home > Documents > 10 Methodological Issues

10 Methodological Issues

Date post: 04-Apr-2018
Category:
Upload: amirul-fuadi
View: 222 times
Download: 0 times
Share this document with a friend

of 32

Transcript
  • 7/30/2019 10 Methodological Issues

    1/32

    Methodological Issues in Cross-Cultural

    Counseling Research:

    Equivalence, Bias, and Translations

    Stefana gisdttir

    Lawrence H. Gerstein

    Deniz Canel inarbasBall State University

    Concerns about the cross-cultural validity of constructs are discussed, including equiv-

    alence, bias, and translation procedures. Methods to enhance equivalence are described,

    as are strategies to evaluate and minimize types of bias. Recommendations for translat-

    ing instruments are also presented. To illustrate some challenges of cross-cultural coun-

    seling research, translation procedures employed in studies published in five counseling

    journals are evaluated. In 15 of 615 empirical articles, a translation of instruments was

    performed. In 9 studies, there was some effort to enhance and evaluate equivalence

    between language versions of the measures employed. In contrast, 2 studies did not report

    using thorough translation and verification procedures, and 4 studies employed a mod-

    erate degree of rigorousness. Suggestions for strengthening translation methodologies and

    enhancing the rigor of cross-cultural counseling research are provided. To conduct

    cross-culturally valid research and deliver culturally appropriate services, counseling

    psychologists must generate and rely on methodologically sound cross-cultural studies.

    This article provides a schema for performing such studies.

    There is growing interest in international issues in the counseling profes-

    sion. There are more publications about cross-cultural issues in counsel-

    ing and the role of counseling outside of the United States (Gerstein, 2005;

    Gerstein & gisdttir, 2005a, 2005b, 2005c; Leong & Blustein, 2000; Leong

    & Ponterotto, 2003; Leung, 2003; gisdttir & Gerstein, 2005). Greaterattention also has been paid to counseling international individuals living in

    the United States (Fouad, 1991; Pedersen, 1991). Confirming this trend is the

    focus of Division 17s past president (2003 to 2004), Louise Douce, on the

    globalization of counseling psychology. Douce encouraged developing a

    strategic plan to enhance the professions global effort and facilitate a move-

    ment that transcends nationalism (Douce, 2004, p. 145). She also stressed

    questioning the validity and applicability of our Eurocentric paradigms and

    the hegemony of such paradigms. Instead, she claimed our paradigms must

    integrate and evolve from indigenous models.

    P. Puncky Heppner continued Douces effort as part of his Division 17

    presidential initiative. Heppner (2006) claimed, Cross-national relationships

    THE COUNSELING PSYCHOLOGIST, Vol. XX No. X, Month XXXX xx-xx

    DOI: 10.1177/0011000007305384

    2007 by the Division of Counseling Psychology.

    1

    Copyright 2007 by Division 17 of Counseling Psychologist Association.

  • 7/30/2019 10 Methodological Issues

    2/32

    have tremendous potential to enhance the basic core of the science and

    practice of counseling psychology, both domestically and internationally

    (p. 147). He also predicted, In the future, counseling psychology will no

    longer be defined as counseling psychology within the United States, but

    rather, the parameters of counseling psychology will cross many countries

    and many cultures (Heppner, 2006, p. 170).

    Although an international focus in counseling is important, there are

    many challenges (cf. Douce, 2004; Heppner, 2006; Pedersen, 2003). This

    article discusses methodological challenges, especially as related to the

    translation and adaptation of instruments for use in international and cross-

    cultural studies and their link to equivalence and bias. While there has been

    discussion in the counseling psychology literature about the benefits andchallenges of cross-cultural counseling and the risks of simply applying

    Western theories and strategies cross-culturally, we were unable to locate

    publications in our literature detailing how to perform cross-culturally valid

    research. There is literature, however, in other areas of psychology (e.g.,

    cross-cultural, social, international) that addresses these topics. This article

    draws from this literature to introduce counseling psychologists to some

    concepts, methods, and issues when conducting cross-cultural research. We

    also extend this literature by discussing the potential use of cross-culturalmethodologies in counseling research.

    As a way to illustrate some challenges of cross-cultural research, we also

    examine, analyze, and evaluate translation practices employed in five

    prominent counseling journals to determine the translation procedures

    counseling researchers have used and the methods employed to minimize

    bias and evaluate equivalence. Finally, we offer recommendations about

    translation methodology and ways to increase validity in cross-cultural

    counseling research.

    METHODOLOGICAL CONCEPTS AND ISSUES

    IN CROSS-CULTURAL RESEARCH

    Approaches to Studying Culture

    There are numerous definitions ofculture in anthropology and counsel-

    ing psychology. Ponterotto, Casas, Suzuki, and Alexander (1995) con-cluded that for most scholars, culture is a learned system of meaning and

    behavior passed from one generation to the next. When studying cultural

    influences on behavior, counseling psychologists may approach cultural

    variables and the design of research from three different angles using the

    indigenous, the cultural, and the cross-cultural approach (Triandis, 2000).

    2 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    3/32

    According to Triandis, when using the indigenous approach, researchers

    are mainly interested in the meaning of concepts in a culture and how such

    meaning may change across demographics within a cultural context (e.g.,

    what does counseling mean in this culture?). With this approach, psychol-

    ogists often study their own culture with the goal of benefiting people in

    that culture. The focus of such studies is the development of a psychology

    tailored to a specific culture without a focus on generalization outside of

    that cultural context (cf. Adamopolous & Lonner, 2001). The main chal-

    lenge with the indigenous approach is the difficulty in avoiding existing

    psychological concepts, theories, and methodologies and therefore deter-

    mining what is indigenous (Adamopolous & Lonner, 2001).

    Triandis (2000) contended with the cultural approach; in contrast, psy-chologists often study cultures other than their own by using ethnographic

    methods. True experimental methods can also be used within this approach

    (van de Vijver, 2001). Again, the meanings of constructs in a culture are the

    main focus without direct comparison of constructs across cultures. The

    aim is to advance the understanding of persons in a sociocultural context

    and to emphasize the importance of culture in understanding behavior

    (Adamopolous & Lonner, 2001). The challenge with this approach is a lack

    of widely accepted research methodology (Adamopolous & Lonner, 2001).Last, Triandis (2000) stated that when using cross-cultural approaches,

    psychologists obtain data in two or more cultures assuming the constructs

    under investigation exist in all of the cultures studied. Here, researchers

    are interested in how a construct affects behavior differently or similarly

    across cultures. Thus, one implication of this approach is an increased

    understanding of the cross-cultural validity and generalizability of the

    theories and/or constructs. The main challenge with this approach is

    demonstrating equivalence of constructs and measures used in the targetcultures and also minimizing biases that may threaten valid cross-cultural

    comparisons.

    In sum, indigenous and cultural approaches focus on the emics, or things

    unique to a culture. These approaches are relativistic in that the aim is

    studying the local context and meaning of constructs without imposing a

    priori definitions of the constructs (Tanaka-Matsumi, 2001). Scholars rep-

    resenting these approaches usually reject claims that psychological theories

    are universal (Kim, 2001). In the cross-cultural approach, in contrast, the

    focus is on the etics, or factors common across cultures (Brislin, Lonner, &

    Thorndike, 1973). Here the goal is to understand similarities and differ-

    ences across cultures, and the comparability of cross-cultural categories or

    dimensions is emphasized (Tanaka-Matsumi, 2001).

    gisdttir et al. / CROSS-CULTURAL VALIDITY 3

  • 7/30/2019 10 Methodological Issues

    4/32

    Methodological Challenges in Cross-Cultural Research

    Scholars from diverse psychology disciplines have pursued cross-cultural

    research for decades, and as a result, a literature on cross-cultural researchmethodologies and challenges emerged (e.g., Berry, 1969; Brislin, 1976;

    Brislin et al., 1973; Lonner & Berry, 1986; Triandis, 1976; van de Vijver,

    2001; van de Vijver & Hambleton, 1996; van de Vijver & Leung, 1997).

    Based on this work, our article identifies some methodological challenges

    faced by cross-cultural researchers. Before proceeding, note that the challenges

    summarized below refer to any cross-cultural comparison of psychological

    constructs (within [e.g., ethnic groups] and between countries). These chal-

    lenges are greater, though, in cross-cultural comparisons requiring transla-tion of instruments.

    Equivalence

    Equivalence is a key concept in cross-cultural psychology. It addresses the

    question of comparability of observations (test scores) across cultures (van

    de Vijver, 2001). Several definitions or forms of equivalence have been

    reported. Lonner (1985), for instance, discussed four types: functional, concep-

    tual, metric, and linguistic. Functional equivalence refers to the function the

    behavior under study (e.g., counselor empathy) has in different cultures.

    If similar behaviors or activities (e.g., smiling) have different functions in var-

    ious cultures, their parameters cannot be used for cross-cultural comparison

    (Jahoda, 1966; Lonner, 1985). In comparison, conceptual equivalence refers

    to the similarity in meaning attached to a behavior or concept (Lonner, 1985;

    Malpass & Poortinga, 1986). Certain behaviors and concepts (e.g., help seeking)

    may vary in meaning across cultures. Metric equivalence refers to psycho-

    metric properties of the tool (e.g., Self-Directed Search) used to measure the

    same construct across cultures. It is assumed if psychometric data from two

    or more cultural groups have the same structure (Malpass & Poortinga,

    1986). Finally, linguistic equivalence has to do with wording of items (form,

    meaning, and structure) in different language versions of an instrument, the

    reading difficulty of the items, and the naturalness of the items in the trans-

    lated form (Lonner, 1985; van de Vijver & Leung, 1997).

    Van de Vijver and his colleagues (van de Vijver, 2001; van de Vijver &

    Leung, 1997) also discussed four types of equivalence representing a hier-archical order from absence to higher degree of equivalence. The first type,

    construct nonequivalence, refers to constructs (e.g., cultural syndromes)

    being so dissimilar across cultures they cannot be compared. Under these

    circumstances, no link exists between the constructs. The next three types

    of equivalence demonstrate some equivalence with the higher level in the

    4 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    5/32

    hierarchy presupposing a lower level. These are construct (or structural),

    measurement unit, and scalar equivalence.

    At the lowest level is construct equivalence. A scale has construct

    equivalence if it measures the same underlying construct across cultural

    groups. Construct equivalence has been demonstrated for many constructs

    in psychology (e.g., NEO Personality InventoryRevised five-factor model

    of personality; McCrae & Costa, 1997). With construct equivalence, the

    constructs (e.g., extraversion) are considered having the same meaning and

    nomological network across cultures (relationships between constructs,

    hypotheses, and measures; e.g., Betz, 2005) but need not be operationally

    defined the same way for each cultural group (e.g., van de Vijver, 2001).

    For instance, two emic measures of attitudes toward counseling may tapdifferent indicators of attitudes in each culture, and therefore, the measures

    may include different items but at the same time be structurally equivalent, as

    they both measure the same dimensions of counseling attitudes and predict

    help seeking. Yet as their measurement differs, a direct comparison of

    average test scores across cultures using a ttest or ANOVA, for example,

    cannot be performed. The measures lack scalar equivalence (see below).

    Construct equivalence is often demonstrated using exploratory and confirma-

    tory factor analyses and structural equation modeling (SEM) to discern thesimilarities and differences of constructs structure and their nomological

    networks across cultures.

    The next level of equivalence is measurement-unit equivalence (van de

    Vijver, 2001; van de Vijver & Leung, 1997). With this type of equivalence,

    the measurement scales of the tools are equivalent (e.g., interval level), but

    their origins are different across groups. While mean scores from scales

    with this level of equivalence can be compared to examine individual dif-

    ferences within groups (e.g., using ttest), because of different origin, com-paring mean scores (e.g., ttest) between groups from scales at this level will

    not provide a valid comparison. For example, Kelvin and Celsius scales

    have equivalent measurement units (interval scales) but measure tempera-

    ture differentlythey have a different origin and, thus, direct comparison

    of temperature using these two scales cannot be done. But because of a con-

    stant difference between these two scales, comparability may be possible

    (i.e., K = C 273). The known constant or value offsetting the scales makes

    them comparable (van de Vijver & Leung, 1997). Such known constants are

    difficult to discern in studies of human behavior, rendering scores at this level

    often incomparable. A clear analogy in counseling psychology is using

    different cut scores for various groups (e.g., gender) on instruments as an

    indicator of some criteria or an underlying trait. Different cut scores (or

    standard scores) are used because instruments do not show equivalence

    beyond the measurement unit. That is, some bias affects the origin of the

    gisdttir et al. / CROSS-CULTURAL VALIDITY 5

  • 7/30/2019 10 Methodological Issues

    6/32

    scale for one group relative to the other, limiting raw score comparability

    between the groups. For example, a raw score of 28 on the Minnesota

    Multiphasic Personality Inventory 2 MacAndrew Alcohol ScaleRevised

    (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 2001) does not mean

    the same thing for women as it does for men. For women, this score indi-

    cates more impulsiveness and greater risk for substance abuse than it does

    for men (Greene, 2000). A less clear example but extremely important to

    cross-cultural research involves two language versions of the same psycho-

    logical instrument. Here the origins of the two language versions of the

    scale may appear the same (both versions include the same interval rating

    scale for the items). This assumption, however, may be threatened if the two

    cultural groups responding to this measure vary in their familiarity withLikert-type answer formats (method bias; see later). Because of the differ-

    ential familiarity with this type of stimuli, the origin of the measurement

    unit is not the same for both groups. Similarly, if the two cultural groups

    vary in response style (e.g., acquiescence), a score of 2 on a 5-point scale

    may not mean the same for both groups. In these examples, the source or

    the origin of the scale is different in the two language versions, compro-

    mising valid cross-cultural comparison.

    Finally, and at the highest level of equivalence, is scalar equivalence or fullscore comparability. Equivalent instruments at the scalar level measure a con-

    cept with the same interval or ratio scale across cultures, and the origins of

    the scales are the same. Therefore, at this level, bias has been ruled out, and

    direct cross-cultural comparisons of average scores on an instrument can be

    made (e.g., van de Vijver & Leung, 1997).

    According to van de Vijver (2001), it can be difficult to discern if measures

    are equivalent at the measurement-unit or scalar level. This challenge is

    observed in comparison of scale scores between cultural groups respondingto the same language version of an instrument as well as between different

    language versions of a measure. As an example of this difficulty, when

    using the same language version of an instrument, racial differences in

    intelligence test scores can be interpreted as representing true differences in

    intelligence (scalar equivalence has been reached) and as an artifact of the

    measures (measurement-unit equivalence has been reached). In the latter,

    the measurement units are the same, but they have different origins because

    of various biases, hindering valid comparisons across different racial

    groups. In this instance, valid comparisons at the ratio level (comparing

    mean scores) cannot be done. Higher levels of equivalence are more diffi-

    cult to establish. It is, for instance, easier to show that an instrument mea-

    sures the same construct across cultures (construct equivalence) by showing

    a similar factor structure and nomological networks than it is to demon-

    strate the instruments numerical comparability (scalar equivalence). The

    6 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    7/32

    higher the level of equivalence, though, the more detailed analysis can be

    performed on cross-cultural similarities and differences (van de Vijver,

    2001; van de Vijver & Leung, 1997).

    Levels of equivalence for measures used in cross-cultural counseling

    research should be established and reported in counseling psychology pub-

    lications. It is not until the equivalence of the concepts under study have

    been determined that a meaningful cross-cultural comparison can be made.

    Without demonstrated equivalence, numerous rival hypotheses (e.g., poor

    translation) may account for observed cross-cultural differences.

    Bias

    Another important concept in cross-cultural research is bias. Bias negatively

    influences equivalence and refers to nuisance factors, limiting the compara-

    bility or scalar equivalence of observations (test scores) across cultural groups

    (van de Vijver, 2001; van de Vijver & Leung, 1997; van de Vijver & Poortinga,

    1997). Typical sources of bias are construct, method, and item bias. A con-

    struct bias occurs when the construct measured as a whole (e.g., intelli-

    gence) is not identical across cultural groups. Potential sources for this type

    of bias are when there is different coverage of the construct across cultures(i.e., not all relevant behavioral domains are sampled), an incomplete overlap

    of how the construct is defined across cultures, and when the appropriate-

    ness of item content differs between two language versions of an instrument

    (cf. van de Vijver & Leung, 1997; van de Vijver & Poortinga, 1997). A serious

    construct bias equates to construct nonequivalence.

    Even though a construct is well represented in multilingual versions of

    a scale (construct equivalence, e.g., similar factor structure, and there is no

    construct bias, e.g., complete coverage of construct), bias may still exist inthe scores, resulting in measurement-unit or scalar nonequivalence (van de

    Vijver & Leung, 1997). This may be a result of method bias. Method bias

    can stem from characteristics of the instrument or from its administration

    (van de Vijver, 2001; van de Vijver & Leung, 1997; van de Vijver &

    Poortinga, 1997). Possible sources of this bias are differential response

    styles (e.g., social desirability) across cultures (e.g., Johnson, Kulesa, Cho,

    & Shavitt, 2005), variations in familiarity with the type of stimuli or scale

    across cultures, communication problems between investigators and partici-

    pants, and differences in physical conditions under which the instrument is

    administered across cultures. Method bias can also limit cross-cultural com-

    parisons when samples drawn from different cultures are not comparable

    (e.g., prior experiences).

    Item bias may also exist, posing a threat to cross-cultural comparison

    (scalar equivalence). This type of bias refers to measurement at the item level.

    gisdttir et al. / CROSS-CULTURAL VALIDITY 7

  • 7/30/2019 10 Methodological Issues

    8/32

    This bias has several potential sources. It can result from poor translation or

    poor item formulation (e.g., complex wording) and because item content may

    not be equally relevant or appropriate for the cultural groups being compared

    (e.g., Malpass & Poortinga, 1986; van de Vijver & Poortinga, 1997). An item

    on an instrument is considered biased if persons from different cultures

    having the same standing on the underlying characteristic (trait or state)

    measured yield different average item scores on the instrument.

    Finally, bias can be considered uniform and nonuniform. A uniform bias

    refers to any type of bias affecting all score levels on an instrument equally

    (van de Vijver and Leung, 1997). For instance, when measuring persons

    intelligence, the scale may be accurate for one group but may consistently

    reflect 10 points too much for another group. The 10-point differencewould appear at different intelligence levels (a true score of 90 would be

    100, and a true score of 120 would be 130). A nonuniform bias is any type

    of bias differentially affecting different score levels. In measuring persons

    intelligence, the scale may again be accurate for one group, but for the other

    group, 10 points are recorded as 12 points. The difference in measured

    intelligence for persons whose true score is 90 would be a score of 108 (18-

    point difference), whereas for persons whose true score is 110, the differ-

    ence is 22 points (a score of 132). The distortion is greater at higher levelson the scale. Nonuniform bias is considered a greater threat in cross-cultural

    comparisons than uniform bias, as it influences the origin and measurement

    unit (scale) of a scale. Uniform bias affects only the origin of a scale

    (cf. van de Vijver, 1998, 2001).

    Relationship Between Bias and Equivalence

    Bias and equivalence are closely related. When two or more languageversions of an instrument are unbiased (construct, method, item), they are

    determined equivalent on a scale level. Bias will lower a measures level of

    equivalence (construct, measurement unit, scalar). Also, construct bias has

    more serious consequences and is more difficult to remedy than method

    and item bias. For instance, selecting a preexisting instrument for transla-

    tion and use on a different language group, the researcher runs the risk of

    incomplete coverage of the construct in the target culture (i.e., construct

    bias limiting construct equivalence). Method bias can be minimized, for

    example, by using standardized administration (administering under simi-

    lar conditions using same instructions) and by using covariates, whereas

    thorough translation procedures may limit item bias. Furthermore, higher

    levels of equivalence are less robust against bias. Scalar equivalence (a

    needed condition for comparison of average scores between groups) is, for

    instance, affected by all types of bias and is more susceptible to bias than

    8 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    9/32

    measurement-unit equivalence or construct equivalence, where comparative

    statements are not a focus (cf. van de Vijver, 1998). Thus, if one wants to

    infer if Culture A shows more or less magnitude of a characteristic (e.g.,

    willingness to seek counseling services) than Culture B, one has to empir-

    ically demonstrate the measures lack of bias and scalar equivalence.

    Not all instruments are equally vulnerable to bias. In fact, more struc-

    tured tests administered under standardized conditions are less susceptible

    to bias than open-ended questions. Similarly, the less the cultural distance

    (Triandis, 1994, 2000) between groups being compared, the less room there

    is for bias. Cultural distance can, for instance, be discerned based on the

    Human Development Index (HDI; United Nations, 2005) published yearly

    by the United Nations Development Programme to assess well-being andchild welfare (human development). Using the HDI as a measure of cultural

    distance, it can be seen that the United States (ranked 10) and Ireland

    (ranked 8) are more similar in terms of human development than the United

    States and Niger (ranked 177). Therefore, it can be expected that greater

    bias affects cross-cultural comparisons between the United States and

    Niger than between the United States and Ireland.

    MEASUREMENT APPROACHES

    Selection of Measurement Devices

    A prerequisite to conducting a cross-cultural study is to make sure what

    is being studied exists and is functionally equivalent across cultures (Berry,

    1969; Lonner, 1985). Once this has been determined, the next step is decid-

    ing how the construct should be assessed. This decision should be based on

    the type of bias expected. If there is a concern with construct bias, the con-

    struct is not functionally equivalent, and serious method bias is expected, the

    researcher may need to rely on emic approaches (indigenous or cultural),

    develop measures meaningful to the culture, and use culture-sensitive

    methodologies. Van de Vijver and Leung (1997) called this strategy the

    assembly approach. Emic techniques (i.e., assembly) are often needed if

    the cultures of interest are very different (Triandis, 1994, 2000). In this

    approach, though, direct comparisons between cultures can be challenging,

    as the two or more measures of the construct may not be equivalent at themeasurement level.

    If, in contrast, the cultures are relatively similar and the concept is func-

    tionally equivalent across cultures, the researcher may opt to translate and/or

    adapt preexisting instruments and methodologies to discern cultural similar-

    ities and differences across cultural groups. Van de Vijver and Leung (1997)

    gisdttir et al. / CROSS-CULTURAL VALIDITY 9

  • 7/30/2019 10 Methodological Issues

    10/32

    listed two common strategies employed when using preexisting measures

    for multilingual groups. First is the applied approach, where an instrument

    goes through a literal translation of items. Item content is not changed to a

    new cultural context, and the linguistic and psychological appropriateness of

    the items are assumed. It is also assumed there is no need to change the

    instrument to avoid bias. According to van de Vijver (2001), this is the most

    common technique in cross-cultural research on multilingual groups. The

    second strategy is adaptation, where some items may be literally translated,

    while others require modification of wording and content to enhance the

    appropriateness to a new cultural context (van de Vijver & Leung, 1997).

    This technique is chosen if there is concern with construct bias.

    Of the three approaches just mentioned (assembly, application, andadaptation), the application strategy is the easiest and least cumbersome in

    terms of money, time, and effort. This technique may also offer high levels

    of equivalence (measurement-unit and scalar equivalence), and it can make

    the comparison to results of other studies using the same instrument possi-

    ble. This approach may not be useful, however, when the characteristic

    behaviors or attitudes (e.g., obedience and being a good daughter or son)

    associated with the construct (e.g., filial piety) differ across cultures (lack

    of construct equivalence and high construct bias) (e.g., Ho, 1996). In suchinstances, the assembly or adaptation strategy may be needed. With the

    assembly approach (emic), researchers may focus on the construct validity

    of the instrument (e.g., factor analysis, divergent and convergent validity),

    not on direct cross-cultural comparisons. When adaptation of an instrument

    is needed in which some items are literally translated, whereas others are

    changed or added, cross-cultural comparisons may be challenging, as direct

    comparisons of total scores may not be feasible because all items are not

    identical. Only scores on identical items can be compared using mean scorecomparisons (Hambleton, 2001). The application technique (etic) to trans-

    lation most easily allows for a direct comparison of test scores using ttests

    or ANOVA because of potential scalar equivalence. For such comparisons

    to be valid, however, an absence of bias needs to be demonstrated.

    The applied approach and to some degree the adaptation strategy focus

    on capturing the etics, or the qualities of concepts common across cultures.

    Yet cultural researchers have criticized it. Berry (1989), for instance, labeled

    this practice imposed etics, claiming that by using the etic approach,

    researchers fail to capture the culturally specific aspects of a construct and

    may erroneously assume the construct exists and functions similarly across

    cultures (cf. Adamopolous & Lonner, 2001). The advantage of the etic over the

    emic strategy, however, is that the etic technique provides the ability to make

    cross-cultural comparisons, whereas in the emic approach, cross-cultural

    comparison is more difficult and not as direct.

    10 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    11/32

    Nevertheless, the etic strategy may be limited when trying to understand

    a specific culture. There is, for instance, no guarantee a translated measure

    developed to assess a concept in one culture will assess the same construct

    equally well in another culture. It is highly likely that some aspects of the

    concept may be lost or not captured by the scale. There might be construct

    bias and lack of construct equivalence. To counteract this shortcoming, sev-

    eral methods have been proposed. Brislin and colleagues (Brislin, 1976,

    1983; Brislin et al., 1973) suggested a combined eticemic strategy. In this

    approach, researchers begin with an existing tool developed in one culture

    that is translated for use in a target culture (potentially etic items). Next,

    additional items are included in the translated scale, which are unique to the

    target culture (emic). The additional items may be developed by personsknowledgeable about the culture and/or drawn from relevant literature.

    These culture-specific items must be highly correlated with the original

    items in the target instrument but unrelated to culture-specific items gener-

    ated from another culture (Brislin, 1976, 1983; Brislin et al., 1973). Adding

    emic items will provide the researcher with a greater in-depth understand-

    ing of a construct in a given culture. Assessing equivalence between the lan-

    guage versions of the instrument would be based only on the shared (etic)

    items (Hambleton, 2001).Similarly, Triandis (1972, 1975, 1976) suggested that researchers start

    with an etic concept (thought to exist in all cultures under study) and then

    develop emic items based on each culture for the etic concept. Thus, all

    instrument development is carried out within each culture included in the

    study (i.e., assembly). Triandis argued that cross-cultural comparison could

    still be made using these versions of the measure (one in each culture)

    because the emic items would be written to measure an etic concept. SEM

    could, for instance, be used for this purpose (see Betz, 2005; Weston &Gore, 2006).

    Finally, a convergence approach can be applied (e.g., van de Vijver,

    1998). Relying on this technique, researchers may assemble a scale mea-

    suring an etic concept in each culture or use preexisting culture-specific

    tools translated into each language. Then all measures are given to each cul-

    tural group. Comparisons can be made between shared items (given enough

    items are shared), whereas nonshared items provide culture-specific under-

    standing of the construct. When this method is used, the appropriateness of

    items in all scales needs to be determined before administration.

    Determining Equivalence of Translated Instruments

    Several statistical methods are available to determine equivalence

    between translated and original versions of scales. Reporting Cronbachs

    gisdttir et al. / CROSS-CULTURAL VALIDITY 11

  • 7/30/2019 10 Methodological Issues

    12/32

    alpha reliability, item-total scale correlations, and item means and variations

    provides initial information about instruments psychometric properties. A

    statistical comparison between two independent reliability coefficients can

    be performed (cf. van de Vijver & Leung, 1997). If the coefficients are sig-

    nificantly different from each other, the source of the difference should be

    examined. This may indicate item or construct bias. Additionally, item-total

    scale correlations may indicate construct bias and nonequivalence, and

    method bias (e.g., administration differences, differential social desirability,

    differential familiarity with instrumentation). Finally, item score distribution

    may suggest biased items and, therefore, information about equivalence. For

    instance, an indicator (e.g., item or scale) showing variation in one cultural

    group but not the other may represent an emic concept (Johnson, 1998).Therefore, comparing these statistics across different language versions of

    an instrument will offer preliminary data about the instruments equivalence

    (e.g., construct, measurement unit, and scalar; van de Vijver & Leung, 1997;

    conceptual and measurement; Lonner, 1985).

    Construct (van de Vijver & Leung, 1997), conceptual, and measurement

    equivalence (Lonner, 1985) can also be assessed at the scale level. Here,

    exploratory and confirmatory factor analysis, multidimensional scaling

    techniques, and cluster analysis can be used (e.g., van de Vijver & Leung,1997). These techniques provide information about whether the construct is

    structurally similar across cultures and if the same meaning is attached to

    the construct. For instance, in confirmatory factor analysis, hypotheses

    about the factor structure of a measure, such as the number of factors, load-

    ings of variables on factors, and correlations among factors, can be tested.

    Numerous fit indices can be used to evaluate the fit of the model to the data.

    Scalar or full score equivalence is more difficult to establish than con-

    struct and measurement-unit equivalence, and various biases may threaten thislevel of equivalence. Item bias, for instance, influences scalar equivalence.

    Item bias can be ascertained by studying the distribution of item scores for

    all cultural groups (cf. van de Vijver & Leung, 1997). Item response theory

    (IRT), in which differential item functioning (DIF) is examined, may be

    used for this purpose. In IRT, it is assumed item responses are related to an

    underlying or latent trait using a logistic curve known as item characteristic

    curve (ICC). The ICCs for each selected parameter (e.g., item difficulty or

    popularity) are compared for every item in each cultural group using chi-

    square statistics. Items differing between cultural groups are eliminated

    before cross-cultural comparisons are made (e.g., Hambleton &

    Swaminathan, 1985; van de Vijver & Leung, 1997). Item bias can also be

    examined by using ANOVA. The item score is treated as the dependent vari-

    able, and the cultural group (e.g., two levels) and score levels (levels depen-

    dent on number of scale items and number of participants scoring at each

    12 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    13/32

    level) are the independent variables. Main effects for culture and the inter-

    action between culture and score level are then examined. Significant effects

    indicate biased items (cf. van de Vijver & Leung, 1997). Logistic regression

    can also be used for this purpose using the same type of independent and

    dependent variables. Additionally, multiple-group SEM invariance analy-

    ses (MCFA) and multiple group mean and covariance structures analysis

    (MACS) also provide information about biased items or indicators (e.g.,

    Byrne, 2004; Cheung & Rensvold, 2000; Little, 1997, 2000), with the MACS

    method also providing information about mean differences between groups

    on latent constructs (e.g., Ployhart & Oswald, 2004).

    Finally, factors contributing to method bias can be assessed and statisti-

    cally held constant when measuring constructs across cultures, given thatvalid measures are available. A measure of social desirability may, for

    instance, be used to partially control for method bias. Also, gross national

    product per capita may be used to control for method bias, as it has been

    found to correlate with social desirability (e.g., Van Hemert, van de Vijver,

    Poortinga, & Georgas, 2002) and acquiescence (Johnson et al., 2005).

    Furthermore, personal experience variables potentially influencing the con-

    struct under study differentially across cultures may serve as covariates.

    Translation Methodology

    Employing a proper translation methodology is extremely important to

    increase equivalence between multilingual versions of an instrument and the

    measures cross-cultural validity. About a decade ago, van de Vijver and

    Hambleton (1996) published practical guidelines for translating psycholog-

    ical tests that were based on standards set forth in 1993 by the International

    Test Commission (ITC). The guidelines covered best practices in regard tocontext, development, administration, and the interpretation of psychologi-

    cal instruments (cf. Hambleton & de Jong, 2003; van de Vijver, 2001; van

    de Vijver & Hambleton, 1996; van de Vijver & Leung, 1997). The context

    guidelines emphasized the importance of minimizing construct, method, and

    item bias and the need to assess, instead of assume, construct similarity

    across cultural groups before embarking on instrument translation. The

    development guidelines referred to the translation process itself, while the

    administration guidelines suggested ways to minimize method bias. Finally,

    the interpretation guidelines recommended caution when explaining score

    differences unless alternative hypotheses had been ruled out and equivalence

    between original and translated measures had been ensured (van de Vijver &

    Hambleton, 1996). Counseling psychologists should review these guidelines

    when designing cross-cultural research projects and prior to translating and

    adapting psychological instruments for such research.

    gisdttir et al. / CROSS-CULTURAL VALIDITY 13

  • 7/30/2019 10 Methodological Issues

    14/32

    Prior to the development of the ITC standards, Brislin et al. (1973) and

    Brislin (1986) had written extensively about translation procedures. The

    following paragraphs outline the common translation methods that Brislin

    et al. summarized with connotations to the ITC guidelines (e.g., Hambleton

    & de Jong, 2003; van de Vijver & Hambleton, 1996). Additional methods

    to enhance equivalence of translated scales are also mentioned.

    Translation. When translating an instrument, bilingual persons who

    speak both the original and the target language should be employed. Either

    a single person or a committee of translators can be used (Brislin et al.,

    1973). In contrast to employing only a single person for the translation, the

    committee approach emphasizes two or more persons performing the trans-lation independently. Then, the translations are compared, sometimes with

    another person, until an agreement is reached on an optimal translation. The

    advantage of the committee approach recommended in the ITC guidelines

    (van de Vijver & Hambleton, 1996) over a single person is the possible

    reduction in bias and misconceptions of a single person. In addition to

    being knowledgeable about the target language of the translation, test trans-

    lators need to be familiar with the target culture, the construct being

    assessed, and the principles of assessment (Hambleton & de Jong, 2003;van de Vijver & Hambleton, 1996). Being knowledgeable about such

    topics minimizes item biases (e.g., in an achievement test, an item in one

    culture may give away more information than the same item in another cul-

    ture) that may result from literal translations.

    Back translation. In this procedure, the translated or target version of

    the measure is independently translated back to the original language by

    different person(s) than the one(s) performing the translation to the targetlanguage. If more than one person is involved in the back translation,

    together they decide on the best back-translated version of the scale that is

    compared to the original same-language version for linguistic equivalence.

    Back translation does not only provide the researcher with some control

    over the end result of the translated instrument in cases where he or she

    does not know the target language (e.g., Brislin et al., 1973; Werner &

    Campbell, 1970), it also allows for further refinement of the translated

    version to ensure equivalence of the measures. If the two same-language

    versions of the scale do not seem identical (i.e., the original and the back-

    translated versions), the researcher in cooperation with the translation com-

    mittee works on the translations until equivalence is reached. Here, the

    items requiring a changed translation may be subject to back translation

    again. Oftentimes in this procedure, only the translated version is changed

    to be equivalent to the original-language version that remains unchanged.

    14 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    15/32

    At other times, the original language version of the scale is also changed to

    ensure equivalence, a process known as decentering (Brislin et al., 1973).

    Adequate back translation does not guarantee a good translation of a scale,

    as this procedure often leads to literal translation at the cost of readability

    and naturalness of the translated version. To minimize this, a team of back

    translators with a combined expertise in psychology and linguistics may be

    used (van de Vijver & Hambleton, 1996). It is also important to note that in

    addition to the test items, test instructions need to go through a thorough

    translation/back-translation process.

    Decentering. This method was first introduced by Werner and Campbell

    (1970) and refers to a translation/back-translation process in which both thesource (original instruments language) and the target language versions

    are considered equally important and both are open to modification.

    Decentering may need to take place if words in the original language have

    no equivalence in the target language. If the aim is collecting data in both the

    original and the target culture, items in the original instrument are changed

    to ensure maximum equivalence (cf. Brislin, 1970, on the translation of

    Marlowe-Crownes [Crowne & Marlowe, 1960] Social Desirability Scale).

    Thus, the back-translated version of the original instrument is used for datacollection instead of the original version, as it is considered most likely to

    be equivalent to the translated version (Brislin, 1986). When this outcome

    is selected and when researchers worry that changes in the original lan-

    guage may lead to a lack of comparability with previous studies using the

    original instrument, Brislin (1986) suggested collecting data using both

    the decentered and the original version of the instrument on a sample

    speaking the original language. The participants may see half of the original

    items and half of the revised items in a counterbalanced order. Statisticalanalysis can indicate whether different conclusions should be made based

    on responses to the original versus the revised items (see Brislin, 1970).

    Pretests. Following translation and back translation of an instrument

    and, therefore, judgmental evidence about the equivalence of the original

    and translated versions of the instrument, several pretest measures can be

    used to evaluate the equivalence of the instruments in regard to the mean-

    ing conveyed by the items. One approach is to administer the original and

    the translated versions of the instrument to bilingual persons (Brislin et al.,

    1973; van de Vijver & Hambleton, 1996). Following the administration of

    the instruments, item responses can be compared using statistical methods

    (e.g., t test). If item differences are discovered between versions of the

    instrument, the translations are reviewed and changed accordingly.

    gisdttir et al. / CROSS-CULTURAL VALIDITY 15

  • 7/30/2019 10 Methodological Issues

    16/32

    Sometimes bilingual individuals are used in lieu of performing back

    translations (Brislin et al., 1973). In this case, the translated version and

    original versions of the instrument are administered to bilingual persons.

    The bilingual persons may be randomly assigned to two groups that receive

    half of the questions in the original language and the other half in the target

    language. The translated items resulting in responses different from responses

    elicited by the same original items are then refined until the responses between

    the original and the translated items are comparable. Items not yielding com-

    parable responses despite revisions are discarded. If items yield comparable

    results, the two versions of the instrument are considered equivalent. Additionally,

    a small group of bilingual individuals can be employed to rate each item from

    the original and translated versions of the instrument on a predetermined scalein regard to the similarity of meaning conveyed by the item. Problematic items

    are then refined until deemed satisfactory (e.g., Hambleton, 2001).

    A small sample of participants (e.g.,N = 10) can also be employed to pretest

    a translated measure that has gone through the translation/back-translation

    iteration. Here, participants are instructed to provide verbal or written feed-

    back about each item of the scale. For example, Brislin et al. (1973) noted

    two methods: random probe and rating of items. In the random probe

    method, the researcher randomly selects items from a scale and asks probingquestions about an item, such as What do you mean? Persons responses

    to the probes are then examined. Responses considered bizarre or unfitting

    an item are scrutinized, and the translation of the item is changed. This

    method provides insight into how well the meaning of the original items has

    fared in the translation. In the rating method, respondents are asked to rate

    their perceptions about item clarity and appropriateness on a predetermined

    scale. Items that are unclear or not fitting based on these ratings are

    reworded. Finally, a focus group approach can be used (e.g., gisdttir,Gerstein, & Gridley, 2000) where a small group of participants responds to

    the translated version and then discusses with the researcher(s) the meaning

    the participants associated with the items. Participants also share their

    perception about the clarity and cultural appropriateness of the items. Item

    wording is then changed based on responses from the focus group members.

    Statistical Assessment of the Translated Measure

    In addition to pretesting a translated scale and judgmental evidence

    about a scales equivalence, researchers need to provide further evidence of

    the measures equivalence to the original instrument. As stated earlier, item

    analyses and Cronbachs alpha suggest equivalence and lack of bias.

    Furthermore, exploratory and confirmatory factor analyses of the measures

    factor structure can contribute information about construct equivalence.

    Multidimensional scaling and cluster analysis can be used to explore construct

    16 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    17/32

    equivalence as well. These techniques indicate equivalence on an instru-

    ment level, more specifically, about the similarities and differences of the

    hypothesized construct underlying the instrument for the different language

    versions. Similar to Brislin et al.s (1973) suggestions mentioned earlier,

    Mallinckrodt and Wang (2004) proposed a method they termed the dual-

    language split half (DLSH) to evaluate equivalence. In this procedure,

    alternate forms of a translated measure, each composed of one half of items

    in the original language and one half of items in the target language, are

    administered to bilingual persons in a counterbalanced order of languages.

    Equivalence between the two language versions of the instruments is deter-

    mined by lack of significant differences between mean scores on the origi-

    nal and translated version of the measures, by split-half correlationsbetween clusters of items on the original and the target language, and by the

    internal consistency reliability and testretest reliability of the dual lan-

    guage form of the measures. These coefficients are compared to results

    from the original-language version of the instrument. Also inherent in this

    approach is collection of evidence for convergent validity for each language

    version. Finally, and as mentioned earlier, to provide further evidence of the

    measures equivalence to the original measure analyses at the item level

    (item bias analysis; van de Vijver & Hambleton, 1996), procedures such asANOVA and IRT to examine DIF can be applied to determine scalar equiv-

    alence (cf. van de Vijver & Leung, 1997). MCFA and MACS invariance

    analyses can be employed for this purpose as well.

    CONTENT ANALYSIS OF TRANSLATION METHODS

    IN SELECT COUNSELING JOURNALS

    Another purpose of this article is to examine, analyze, and evaluate

    translation practices employed in five prominent counseling journals

    thought to publish a greater number of articles on international topics

    than other counseling periodicals. This purpose was pursued to determine

    whether counseling researchers have, in fact, followed the translation pro-

    cedures suggested by Brislin (1986) and Brislin et al. (1973) and in the

    ITC guidelines (e.g., van de Vijver and Hambleton, 1996). We also examined

    the methods used to control for bias and increase equivalence. While this

    was not the primary purpose of this article, results of our investigation

    might help illustrate counseling researchers use of preferred translation

    principles mentioned in the cross-cultural literature. It was also assumed

    results obtained from this type of investigation could help identify further

    recommendations to assist counseling researchers when conducting cross-

    cultural studies and when reporting results of such projects in the schol-

    arly literature.

    gisdttir et al. / CROSS-CULTURAL VALIDITY 17

  • 7/30/2019 10 Methodological Issues

    18/32

    METHOD

    Sample

    The sample consisted of published studies employing translated instru-

    ments in their data collection. To be included in this project, an integral part

    of the studys methodology had to be a translation of one or more entire

    instrument or some subset of items from an instrument. Furthermore, the

    target instrument could not have been translated or evaluated the same way

    in earlier studies. Additionally, the included studies had to either compare

    responses from persons from more than one culture (nationality) or inves-

    tigate a psychological concept using a non-U.S. or non-English-speakingsample of participants. Studies for this investigation were sampled from

    five counseling journals (Journal of Counseling Psychology [JCP],Journal of

    Counseling and Development[JCD],Journal of Multicultural Counseling

    and Development [JMCD], Measurement and Evaluation in Counseling

    and Development [MECD], and The Counseling Psychologist [TCP])

    thought to publish articles relevant to non-English-speaking cultures, eth-

    nic groups, and/or countries. To assess for more recent trends in the litera-

    ture, only articles published between the years 2000 and 2005 were

    included in our sample. We assumed recent studies (i.e., studies published

    since 2000) would provide a good representation of current translation and

    verification practices employed by counseling researchers. From 2000 to

    2005, a total of 615 empirical articles were published in the targeted jour-

    nals. Of these articles, 15 included translation as a part of their methodol-

    ogy. Therefore, 2.4% of the empirical articles published in these five

    counseling journals incorporated a translation process.

    Procedure

    The 15 identified studies were coded by (a) publication source (e.g.,

    TCP), (b) year of publication (e.g., 2001), (c) construct investigated and

    name of scale translated, (d) translation methodology used (single person,

    committee, bilinguals), (e) whether the translated version of the scale was

    pilot tested (yes or no) before main data collection, (f) number of partici-

    pants used for pilot testing, (g) psychometric properties reported and statis-

    tics used to evaluate the translated measures equivalence to the originalscale, and (h) number of participants from which the psychometric data

    were gathered. Two of the current authors coded information from the arti-

    cles independently. If disagreements arose in the coding (e.g., relevant

    psychometrics for equivalence evaluation), these were resolved through

    consensus agreement between the coders.

    18 THE COUNSELING PSYCHOLOGIST / Month XXXX

    (text continues on p. 22)

  • 7/30/2019 10 Methodological Issues

    19/32

    19

    1.Shin,Berkson,&

    Crittenden(2000);

    JMCD

    2.Engels,Finkenauer,

    Meeus,&Dekovic

    (2001);JCP

    3.Chung&Bemak

    (2002);JCD

    4.Kasturirangan&

    Nutt-Williams

    (2003);JMCD

    5.Asner-Self&

    Schreiber(2004);

    MECD

    6.Torres&Rollock

    (2004);MECD

    Psychological

    help-seeking

    attitudes;

    traditionalvalues

    Parentalattachment;

    Relation

    al

    comp

    etence;

    Self-e

    steem;

    Depression

    Anxiety

    ;

    depre

    ssion;

    psych

    osocial

    dysfu

    nction

    symptoms

    Culture

    Domesticviolence

    Attributionalstyle

    Acculturation-related

    challe

    nges

    Immigrantsfrom

    Korea

    Dutchadolescents

    SoutheasternAsian

    refugees

    Latinowomen

    Immigrantsfrom

    CentralAmerica

    Immigrantsfrom

    Central&South

    America

    Sixitemsfromthe

    A

    ttitudesToward

    SeekingProfessional

    PsychologicalHelp

    (A

    TSPPH);

    A

    cculturationAttitude

    Scale,

    (AAS)prior

    translation;Vignettes

    developedinEnglish

    Pare

    ntandPeer

    A

    ttachment(IPPA);

    PerceivedCompetence

    ScaleforChildren;

    Self-EsteemScale;

    D

    epressiveMoodList

    HealthOpinionSurvey

    (interview)

    Ase

    mistructured

    in

    terviewprotocol

    developedbythe

    re

    searchers:Two

    in

    terviewsinEnglish,

    se

    veninSpanish

    The

    AttributionalStyle

    Q

    uestionnaire(ASQ)

    CulturalAdjustment

    D

    ifficultiesChecklist

    (C

    ADC)

    EnglishtoKorean

    EnglishtoDutch

    Englishto

    Vietnamese,

    Khmer,

    Laotian

    EnglishtoSpanish.

    Nodiscussionof

    translationmethod

    Englishto

    Spanish

    EnglishtoSpanish

    Committee

    Committee

    (researchers);

    unclearwhat

    instrumentswere

    translatedinstudy

    Committee

    Not r

    eported

    Committee

    Committee

    Yes

    Yes(

    researchers)

    Yes

    No

    Yes

    Yes

    No

    No

    Pilot in

    terviews

    Pilotinterview;no

    comparison

    between

    Englishand

    Spanishversion

    ofprotocol

    priortodata

    collection

    No

    No

    N/A

    N/A

    N/A

    N/A

    Englishversion

    ofprotocol

    administeredto

    (n=3)Latina

    women

    N/A

    Notreportedfor

    the10%ofthe

    samplethat

    respondedto

    thisversion

    A

    TSPPH:Factoranalysis

    AAS:Cronbach'salpha

    (N=110Koreanimmigrants

    inU.S.)

    C

    ronbach'salpha(N=412

    Dutch

    adolescents)

    E

    xploratoryfactoryanalysis

    forVietnamese(N=867),

    Cambodian(N=590),and

    Laotian(n=723)persons

    L

    atinaprofessorofforeign

    languageservedasanauditor

    toensurepropertranslation

    oftranscriptsfromSpanish

    toEnglish(n=7)Latina

    women

    C

    ronbach'salpha,principle

    componentsanalysis(N=89

    CentralAmerican

    immigrantsinU.S.)

    C

    ronbach'salpha(N=86

    Hispanicimmigrants).90%

    ofthesamplerespondedto

    thetranslatedversionof

    instruments.

    Nocomparison

    reportedbetweenthetwo

    languageversions

    Assigned

    Psychom

    etricsReported

    Number,

    Approach

    Citation,

    Typeof

    Instrument

    to

    Back

    andJournal

    Construct

    Sample

    Name

    Translation

    Translation

    Translation

    Pretest

    Original

    Target

    TABLE1:

    StudiesInvolvingTranslationofInstruments

    (continued)

  • 7/30/2019 10 Methodological Issues

    20/32

    20

    7.Oh&Neville

    (2004);TCP

    8.Asner-Self&

    Marotta(2005);

    JCD

    9.Wei&Heppner

    (2005);TCP

    Cross-culturalstudies

    10.

    Marino,

    Stuart,&

    Minas(2000);

    MECD

    Korean

    rapemyth

    acceptance

    Depression,anxiety,

    phob

    icanxiety;

    Erikson'seight

    psychosocialstages

    Clients'perceptions

    ofco

    unselorcredi-

    bility

    ;working

    alliance

    Accultu

    ration

    Koreancollege

    students

    Immigrantsfrom

    CentralAmerica

    Counselorclient

    dyadsinTaiwan

    Anglo-Celtic

    Australians&

    Vietnamese

    immigrantsto

    Australia

    Illin

    oisRapeMyth

    A

    cceptanceScale

    (IRMAS)(26items

    fromIRMASwere

    translatedandinclud-

    e

    dinthepreliminary

    v

    ersionoftheKorean

    R

    apeMyth

    A

    cceptanceScale;

    K

    RMAS)

    BriefSymptom

    Inventory(BSI);

    M

    easuresof

    P

    sychosocial

    D

    evelopment(MPD)

    Cou

    nselorRating

    F

    ormshortVersion

    (CRF-S);The

    W

    orkingAlliance

    Inventoryshort

    V

    ersion(WAI-S)

    Dev

    elopeda

    q

    uestionnaire(in

    E

    nglish)measuring

    b

    ehavioraland

    p

    sychological

    a

    cculturation,and

    socioeconomicand

    d

    emographic

    influenceson

    a

    cculturation

    EnglishtoKorean

    EnglishtoSpanish

    EnglishtoMandarin

    EnglishtoVietnamese

    Singleperson

    Notreported

    Singleperson

    Committee

    Yes

    Yes

    Yes

    Yes

    Yes;Focusgroup

    (n=4South

    Korean

    nationals)

    evaluatedeach

    itemfrom

    IRMASand26

    itemsgenerated

    fromKorean

    literature.A

    ll

    itemswerein

    Korean

    Notreported

    No

    Yes(n=10)

    Vietnamese

    version

    N/A

    Notreported

    N/A

    Cronbach'salpha,

    (N=196

    Anglo-Celtic

    Australians)

    Study1:Principlecomponents

    analysisfollowedby

    exploratoryfactoranalysis

    (N=348SouthKorean

    collegestudents).Study2:

    confirmatoryFactor

    analysis,

    factorial

    invarianceprocedure,

    Cronbach'salpha,

    &

    MANOVAtoestablish

    criterionvalidity(N=547

    SouthKoreannationals).

    Study3:Testretest

    reliability(N=40South

    Koreanteachersorschool

    administrators)

    Notreported.

    Noinformation

    aboutnumberof

    participantsrespondingto

    EnglishorSpanishversions

    ofinstruments.Volunteers

    probedabouttheresearch

    experience.

    Cronbach'salpha,

    intercorrelationsamong

    CRFsubscales(construct

    validity)(N=31counselor/

    clientdyadsinTaiwan)

    Cronbach'salpha(N=187

    VietnameseAustralians).

    Vietnameseparticipants

    respondedtoeitheranEnglish

    oraVietnameseversionof

    theinstrument.Statistical

    evidenceofequivalence

    betweenthesetwolanguage

    versionsoftheinstrument

    wasnotreported

    Assigned

    Psycho

    metricsReported

    Number,

    Approach

    Citation,

    Typeof

    Instrument

    to

    Back

    andJournal

    Construct

    Sample

    Name

    Translation

    Translation

    Translation

    Pretest

    Original

    Target

    TABLE:(continued)

  • 7/30/2019 10 Methodological Issues

    21/32

    21

    11.gisdttir&

    Gerstein(2000);

    JCD

    12.Poasa,

    Mallinckrodt,&

    Suzuki(2000);

    TCP

    13.Tang(2002);

    JMCD

    Equivalencestudies

    14.Chang&

    Myers(2003);

    MECD

    15.Mallinckrodt&

    Wang(2004);JCP

    Counseling

    expec

    tations;

    Holla

    nd'stypology

    Causalattributions

    Careerc

    hoice

    Wellness

    Adultattachment

    Icelandic&U.S.

    collegestudents

    U.S.,

    American

    Somoan,&

    WesternSamoan

    collegestudents

    Chinese,

    Chinese-American,

    &Caucasian

    Americancollege

    students

    Immigrantsfrom

    Korea

    Int'lstudentsfrom

    Taiwan

    ExpectationsAbout

    Counseling

    Questionnaire(EAC-B);

    Self-DirectedSearch

    (S

    DS)

    Questionnaireof

    A

    ttributionand

    Culture(QAC;

    vignetteswithopen-

    endedresponseprobes

    developedinEnglish)

    Aquestionnaire

    developedinEnglish

    in

    thestudyto

    m

    easureinfluenceson

    careerchoice

    The

    WellnessEvaluation

    ofLifestyle(WEL)

    The

    Experiencesin

    CloseRelationships

    Scale(ECRS)

    EnglishtoIcelandic

    EnglishtoSamoan

    EnglishtoChinese

    EnglishtoKorean

    EnglishtoChinese

    Committee

    Singleperson

    Singleperson

    (researcher)

    Singletranslator

    whosetranslations

    wereeditedbyfirst

    author.

    Discrepancies

    resolvedbetween

    translatorand

    editoruponmutual

    agreement

    Committee

    Yes

    Yes

    Yes

    No

    Yes

    FocusGroup(n=

    8)Icelandic

    version

    Englishversionof

    QACpilot

    testedand

    respondents

    provided

    feedbackto

    evaluate

    equivalence

    (n=16)

    No

    Yes(n=3):

    Bilingualexam-

    ineestookboth

    theEnglishand

    theKorean

    version.Effect

    size(Cohen'sd)

    ofdifferencein

    meanscores

    between

    Englishand

    Koreanversion

    No

    Cronbach'salpha

    (N=225U.S.

    college

    students)

    AteamofEnglish-

    speaking

    persons(n=4)

    independently

    codedthe

    English-

    language

    responsesfrom

    QACandinter-

    views(N=23)

    Nonereportedfor

    Caucasian

    American(N=

    124)andAsian

    American

    (131)college

    students

    Nonereportedfor

    alargersample

    (Nnot

    reported)

    Split-half

    reliability,

    Cronbach's

    alpha(N=399

    U.S.college

    students)

    C

    ronbach'salpha(N=261

    Icelandiccollegestudents).

    Covariateanalysis(prior

    counselingexperience)used

    tocontrolformethodbias.

    A

    teamofSamoan-speaking

    persons(n=3)

    independentlycodedthe

    Samoanlanguageresponses

    fromQACandinterviews

    (N=50).Noinformation

    aboutifthemes/codeswere

    translatedfromSamoanto

    English

    N

    onereportedforChinese

    (N=120)collegestudents

    N

    onereportedforalarger

    sample(Nnotreported)

    U

    sedbilinguals(n=30

    Taiwaneseinternational

    collegestudents)toevaluate

    equivalenceusingDLSH

    method:within-subjects

    ttestbetweentwolanguage

    versions,split-halfreliability,

    Cronbach'salpha,testretest

    reliabilityandconstruct

    validitycorrelationswitha

    relatedconstruct

  • 7/30/2019 10 Methodological Issues

    22/32

    RESULTS

    Table 1 lists results found for each of the 15 studies. Three of the

    included studies used a structured or semistructured interviewtest protocol.

    In 3 studies, of which one included a semistructured test protocol, an

    English-language instrument was developed and then translated to another

    language. Furthermore, in 9 studies, one or more preexisting measures (the

    entire instrument or subset of items) were translated into a language other

    than English. In the 15 studies, a range of constructs was examined, includ-

    ing persons counseling orientations (e.g., help-seeking attitudes, counsel-

    ing expectations), adjustment (e.g., acculturation), and maladjustment (e.g.,

    psychological stress). A diversity of cultural groups was represented in the15 studies as well (see Table 1).

    Evaluation of Included Studies

    Two main criteria were used to evaluate these 15 studies: (a) the trans-

    lation methodology employed (single person, committee, back translation,

    pretest), which provides judgmental evidence about the equivalence of the

    translated measure to the original measure; and (b) whether statisticalmethods were used to verify equivalence of the translated measure to its

    original-language version. Because the studies ranged in terms of their pur-

    pose and the approaches taken when investigating multicultural groups, and

    also because these strategies were linked with different measurement

    opportunities of equivalence and bias, we divided these 15 studies into

    three categories: target-language, cross-cultural, and equivalence studies.

    The target-language studies included projects in which only translated ver-

    sions of measures were investigated. These studies employed either cross-cultural (etic) methodologies or a combination of cultural and

    cross-cultural methodologies (emicetic). For these studies, there was no

    direct comparison made between an original and a translated version of the

    protocol. The second category of studies used a cross-cultural approach, as

    they compared two or more groups on a certain construct. Each of these

    groups received the original and translated versions of a measure. Finally,

    the third category of studies was specifically designed to examine equiva-

    lence between two language versions of an instrument. These studies we

    termed equivalence studies.

    We identified studies that employed sound versus weak translation method-

    ologies. This task turned out to be difficult, however, because of the scarcity

    of information reported about the translation processes used. Sometimes,

    the translation procedure was described in only a couple of sentences. In

    other instances, the translation methodology was discussed in more detail

    22 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    23/32

    (e.g., number and qualifications of translators and back translators), while in

    fewer instances, examples were provided about acceptable and unacceptable

    item translations.

    Despite these difficulties, and based on available information, we con-

    trasted relatively sound and weak translation procedures. Translation methods

    we considered weak did not incorporate any mechanism to evaluate the trans-

    lation, including either judgmental (e.g., back translation, use of bilinguals,

    pretest) and/or quantitative (statistical evidence of equivalence) procedures.

    Instead, such a protocol was translated to one or more languages without any

    apparent evaluation about its equivalence to the original language version.

    Methodologically sound studies incorporated both judgmental and quantita-

    tive methods to assess the validity of the translation. Given these criteria toevaluate the methodological rigor of the translation process employed, we now

    present the analyses of the 15 identified studies in the literature.

    Target-language studies. Eight of the 15 studies administered and exam-

    ined responses from a translated measure without direct comparison to a

    group responding to an original-language version of the measure (see Table

    1). In most of these studies, persons from one cultural group participated.

    Both quantitative and qualitative methods were employed. These studiesrelied on preexisting instruments, select items from preexisting instru-

    ments, or interview protocols translated into a new language. We also

    included in this category studies in which a protocol was developed in

    English and translated into another language.

    In two studies (4 and 8), few procedures were reported to evaluate the

    translation and verify the different language form of the measures used (see

    Table 1). In these studies, two language versions of a scale were collapsed

    into one set of responses without evaluating their equivalence. A strongerdesign for these studies would ensure judgmental equivalence between the

    two language versions of the scales. This could have been accomplished by

    using a committee of translators and independent back translators. A

    stronger design would have also resulted from incorporating a decentering

    process when developing the adapted measures and, if appropriate, by sta-

    tistically assessing equivalence. Thus, we considered these studies weak in

    terms of their methodological rigor.

    Sound translation methods incorporate several mechanisms to evaluate

    a translated version of a protocol. They involve, for instance, a committee

    approach to translation/back translation, a pretest of the scale, and an eval-

    uation of the instruments psychometric properties relative to the original

    version. Four studies reported information somewhat consistent with our

    criteria for sound methodological procedures (3, 5, 7, and 9). The authors,

    with varying degree of detail, reported using either a single person or a

    gisdttir et al. / CROSS-CULTURAL VALIDITY 23

  • 7/30/2019 10 Methodological Issues

    24/32

    committee approach to translation, they relied on back translation, and they

    employed one or more independent experts to evaluate the equivalence of

    the language forms. They also reported making subsequent changes to the

    translated version of the instruments they were using. Additionally, in some

    of these studies, a pretest of the translated protocol was performed, and in

    all of these projects, the investigators discussed the statistical tests of the

    measures psychometric properties (see Table 1).

    The remaining three studies in this category (1, 2, and 6) contained

    translation methods of moderate quality, in that their quality ranged in

    between those we considered using relatively weak and strong translation

    procedures. In fact, the translation process was not fully described.

    Furthermore, in one instance, the same person performed the translationand the back translation (2), and in another (6), no assessment of equiva-

    lence was reported on the two language versions of the scale used before

    responses were collapsed into one data set. Also, in one study (1), translated

    items from an existing scale were selected a priori without any quantitative

    or qualitative (e.g., pretest) assurance these items fit the cultural group to

    which they were administered. In none of these three studies were the mea-

    sures pretested before collecting data for the main study. Finally, insufficient

    information was reported about the translated instruments psychometricproperties to evaluate the validity of the measures for the targeted cultural

    groups. The internal validity of these studies could have been greatly

    improved had the researchers included some of these procedures in the

    translation and verification process.

    Cross-cultural studies. Four of the 15 studies directly compared two

    or more cultural groups. In 3 of these studies, an instrument was developed

    in English and then translated into another language, whereas in 1 study, apreexisting instrument was translated to another language (see Table 1). In

    all 4 studies, comparisons were made between language groups relying on

    two language versions of the same instrument.

    None of these four studies employed a particularly weak translation

    methodology. Yet three of the four studies (11, 12, and 13) used relatively

    rigorous methods. In these three studies, the scales were pretested follow-

    ing the translation/back-translation process, providing judgmental evidence

    of equivalence. Additionally, in the two quantitative studies (10 and 11),

    the researchers compared Cronbachs alphas between language versions.

    Finally, in one study (11), equivalence was further determined by employ-

    ing covariate analysis to control for method bias (different experiences of

    participants across cultures) in support of scalar equivalence. None of these

    approaches to examine and ensure equivalence was reported in the Tang

    (2002) study. As a result, we concluded that this study used the least valid

    24 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    25/32

    approach. It is noteworthy that all four studies in this category failed to

    assess the factor structure of the different language versions of the mea-

    sures, and as such, they did not provide additional support for construct

    equivalence. Similarly, none of these studies assessed item bias or per-

    formed any detailed analyses to verify scalar equivalence. Employing these

    additional analyses would have greatly enhanced the validity of the reported

    cross-cultural comparisons in these four studies.

    Equivalence studies. Two of the 15 studies were treated as separate

    cases, as they were specifically designed to demonstrate and evaluate

    equivalence between two language versions of a scale (see Table 1).

    Therefore, we did not evaluate these the same way as the other 13 studies.Instead, they are examples of how to enhance cross-cultural validity of

    translated and adapted scales. We concluded that Mallinckrodt and Wangs

    (2004) approach to determine construct equivalence between language ver-

    sions of a measure was significantly more rigorous than the one presented

    by Chang and Myers (2003).

    As can be seen from Table 1, Chang and Myers (2003) employed three

    bilingual persons in lieu of back translation. In their approach, bilingual

    persons average scale scores to both versions of a scale were compared.Mallinckrodt and Wang (2004), in contrast, used both back translation and

    bilingual individuals to demonstrate and ensure equivalence. Their method

    subsumed the method employed by Chang and Myers. Following a back

    translation of an instrument, Mallinckrodt and Wang used a quantitative

    methodology, the DLSH, to assess equivalence between two language

    versions of a scale (see discussion earlier). In brief, with this approach,

    responses from bilingual individuals receiving half of the items in each

    language were compared to a criterion sample of persons responding to theoriginal version of the scale. By comparing average scale scores, reliability

    coefficients, and construct validity correlations, the researchers were able to

    examine the equivalence (construct and to some degree scalar equivalence)

    between the two language versions of the instrument.

    Interpretation of Results

    The current results are consistent with Mallinckrodt and Wang (2004),

    who discovered in their review of articles published in two counseling jour-

    nals (JCP and TCP) that few studies in counseling psychology have inves-

    tigated multilingual or international groups or employed translation methods.

    Additionally, consistent with these investigators, we found in many instances,

    counseling researchers used inadequate procedures to verify equivalence

    between language versions of an instrument. For example, our analyses

    gisdttir et al. / CROSS-CULTURAL VALIDITY 25

  • 7/30/2019 10 Methodological Issues

    26/32

    indicated just more than half of the 15 studies employed a committee of

    translators. A committee is highly recommended in the ITC guidelines (van

    de Vijver & Hambleton, 1996).

    We also discovered in less than half of the 15 studies that the measure-

    ment devices were pretested, and in slightly more than half of the studies,

    the researchers used quantitative methods to further demonstrate equiva-

    lence. Furthermore, only 1 study systematically controlled for method bias,

    while none of the 15 studies assessed for item bias. All these procedures

    are recommended in the ITC guidelines. On a positive note, however, all

    but 2 studies used a back-translation procedure to enhance equivalence.

    Taken together, all of these results are disquieting and lead us to call for

    employing more rigorous research designs when studying culture, whenusing and evaluating translated instruments, and when performing cross-

    cultural comparisons.

    Additionally, we found, in many cases, limited attention was placed on

    discussing translation methods. Hambleton (2001) also observed this trend.

    Not knowing the reason for this lack of effort, we speculate about why

    methods of translation were not described in more detail. One reason could

    be the lack of importance placed on this methodological feature of a

    research design. Another may relate to an authors desire to comply withpage limitations in journals. A third reason could be a researchers failure

    to recognize the importance of reporting the details about methods of trans-

    lation. Finally, it is conceivable that researchers assume others are aware of

    common methods of translation and thus do not discuss the methods they

    use in much detail. Whatever the reasons, consistent with the ITC guide-

    lines, we strongly suggest investigators provide detailed information about

    the methods they employ when translating and validating instruments used

    in research. This is especially important, as an inappropriate translation ofa measure can lead to a serious threat to a studys internal validity, may con-

    tribute to bias, and in international comparisons may limit the level of

    equivalence between multilingual versions of a measure. As a threat to

    internal validity, a poorly translated instrument may act as a strong rival

    hypothesis for obtained results.

    RECOMMENDATIONS

    Translation Practices

    Several steps are essential for a valid translation. Based on our and

    Brislin and colleagues (Brislin, 1986; Brislin et al., 1973) review of common

    translation methods and the ITC guidelines (e.g., Hambleton, 2001; van de Vijver

    & Hambleton, 1996), the best translation procedure involves several steps as

    26 THE COUNSELING PSYCHOLOGIST / Month XXXX

  • 7/30/2019 10 Methodological Issues

    27/32

    outlined in Table 2. All but the last step outlined in this table help to minimize

    item and construct bias and therefore may increase scalar equivalence between

    language versions of a measure (ITC development guidelines). The last step

    or recommendation refers to verifying cross-cultural validity of measures

    (i.e., absence of bias and equivalence; ITC interpretation guidelines).

    Combining Emic and Etic Approaches

    As stated previously, the cross-cultural approach to studying cultural

    influences on behavior has limitations. One risk involves assuming universal

    laws of behavior and neglecting an in-depth understanding of cultures and

    their influences on behavior (e.g., imposed etics). To address this problem,

    and in line with suggestions reviewed earlier, we offer several recommenda-

    tions for counseling psychologists involved in international research. First,

    collaboration between scholars worldwide and across disciplines is suggested

    to enhance the quality of cross-cultural studies and the validity of methods

    and findings. Such collaboration increases the possibility that unique cultural

    variables will be incorporated into the research and potential threats to

    internal and external validity will be reduced. Second, to avoid potential

    method bias, an integration of quantitative and qualitative methods should be

    considered, especially when one type of method may be more appropriate and

    relevant to a particular culture. A convergence of results from both methods

    gisdttir et al. / CROSS-CULTURAL VALIDITY 27

    TABLE 2: Summary of Recommended Translation Practices

    1. Independent translation from two or more persons familiar with the target lan-

    guage and culture and intent of the scale2. Documentation of comparisons of translations and agreement on the best trans-

    lation

    3. Rewriting of translated items to fit grammatical structure of target language

    4. Independent back translation of translated measure into original language (one or

    more persons)

    5. Comparison of original and back-translated versions, focusing on appropriate-

    ness, clarity, meaning (e.g., use rating scales)

    6. Changes to the translated measure based on prior comparison. Changed items go

    through the translation/back-translation iteration until satisfactory7. If concepts or ideas do not translate well, deciding what version of the original

    version of a scale should be used for cross-cultural comparison (original, back

    translated, or decentered)

    8. Pretest of translated instrument on an independent sample (bilinguals or target

    language group). Check for clarity, appropriateness, and meaning

    9. Assessment of the scales reliability and validity, absence of bias, and equivalence

    to the original-language version of the scale

  • 7/30/2019 10 Methodological Issues

    28/32

    enhances the validity of the findings. Third, when method bias is not expected

    but there is a potential for construct bias while the use of a preexisting mea-

    sure is considered feasible, researchers should consider collecting emic items

    to be included in the instrument when studying an etic construct (e.g., Brislin,

    1976; Oh & Neville, 2004). This approach will enhance construct equiva-

    lence by limiting construct bias and will provide culture-specific information

    to aid theory development. Fourth, when emic scales are available in the cul-

    tures of interest to assess an etic construct and cross-cultural comparisons are

    sought, the convergence approach should be considered. With this approach,

    all instruments are translated and administered to each cultural group. Then,

    items and scales shared across cultures are used for cross-cultural compar-

    isons, whereas nonshared items provide information about the unique aspectof the construct in each culture (e.g., van de Vijver, 1998). This approach will

    enhance construct equivalence, it may deepen the current understanding of

    cultural and cross-cultural dimensions of a construct, and it may aid theory

    development. Finally, Triandiss (1972, 1976) suggestion can be considered.

    With this procedure, instruments are simultaneously assembled in each cul-

    ture to measure the etic construct (e.g., subjective well-being). With this

    approach, most or all types of biases can be minimized and equivalence

    enhanced, as no predetermined stimuli are used. Factor analyses can be per-formed to identify etic constructs for cross-cultural comparisons.

    CONCLUSION

    Given our professions increased interest in international topics, there is

    a critical need to address methodological challenges unique to this area. We

    discussed important challenges such as translation, equivalence, and bias.Proper translation methods may strengthen the equivalence of constructs

    across cultures, as a focus on instrumentation can minimize item bias and

    some method bias. Consequently, construct equivalence may be enhanced.

    Merely targeting an instruments translation, however, is not sufficient.

    Other factors to consider when mak


Recommended