When What You Have Is Not Enough NCOLCTL 24 April 2009 Ray Clifford.

transcript

When What You HaveIs Not Enough

NCOLCTL

24 April 2009

Ray Clifford

How Do You Understand theTitle of This Session?

• In the LCTLs there is a shortage of….– Expertise.– Time.– Textbooks.– Tests.– Patience.– Students.– Something else.

And the answer (from a test development perspective) is …

And the answer (from a test development perspective) is …there are often too few students to follow “normal” validation procedures for Reading and Listening proficiency tests.

But are you sure you want to use a proficiency test?

• Proficiency testing is not always the right choice.

Testing is complicated – but it is important!

• Language tests can motivate.

• Language tests can demotivate.

Language Testing and Motivation

• Appropriate tests can motivate learners to improve their skills.

• Appropriate tests can motivate teachers to refine their teaching to match their students’ needs.

• Inappropriate tests can de-motivate both students and teachers.

“Washback” Effects

• Testing has a negative impact when:– Educational goals are reduced to those that

are most easily measured. – Testing procedures do not reflect course

goals, for instance…• Giving multiple choice tests in writing classes.• Using grammar tests as a measure of general

proficiency.• Basing speaking ability on pronunciation alone.

Washback Effects of Tests

• Testing has a positive impact when:– Tests reinforce course objectives.– Tests act as change agents for improving

teaching and learning.

If Tests Are to bePositive Motivators

• We have to select the right type of test for each testing purpose.

3 Major Types of Tests

• Achievement

• Performance

• Proficiency

3 Major Types of Learning

• Limited Transfer

• Near Transfer

• Far Transfer

Aligned Test and Learning Types

• Achievement (Limited Transfer)– Memorized responses using the content of

a specific textbook or curriculum.

• Performance (Near Transfer)– Rehearsed ability to communicate in

specific, familiar settings.

• Proficiency (Far Transfer)– Unrehearsed general ability to accomplish

real-world communication tasks across a wide range of topics and settings.

More on Types of Tests

• Achievement Tests measure:– Rehearsed, memorized responses.

– What was taught.

– Content of a specific textbook or curriculum.

Sample Achievement Test Item

Complete the following with the correct verb form in the past tense.

(go) I _____________ to the United States last year.

(be) My seat on the plane _______ in business class.

(have) My associates and I _________ meetings each day.

(eat) We _________ at typical American restaurants.

• Performance Tests measure: – Semi-rehearsed and rehearsed

responses.

– Ability to communicate in constrained, familiar, and predictable settings.

– What one can do with what has been taught and practiced.

Sample Performance Test Item

Complete the following sentences about an upcoming business trip. Add a minimum of 5 additional words to each sentence.

For an upcoming business trip I plan to __________________________________________.

I am certain that the trip will be successful, because __________________________________________.

• Proficiency Tests measure: – Spontaneous, unrehearsed

communication ability.

– General ability to accomplish communication tasks in a variety of settings.

– Whether skills are transferable from one context to another.

Sample Proficiency Test Item

You will be taking a business trip abroad. Plan an itinerary that spends at least two days in each of the three cities you must visit and costs less than $4,000 for all travel expenses. Then negotiate with a travel agent to purchase the airplane tickets, arrange hotel reservations, and obtain sufficient information about local transportation options to be able to complete the trip within your budget.

What Distinguishes Proficiency Tests

from other tests?

• They test real world tasks.

• They measure a person’s ability to function in a language.

• They provide an overall evaluation across a range of real-world tasks.

• They rate a person’s unrehearsed ability against a set of task, conditions, and accuracy criteria.

ACTFL Proficiency ScaleNovice

Memorized language• Lists words/phrases

– Telegraphic

• Attempts at conversation– Reactive

• Limited topic areas– Social courtesies– Dates, numbers, colors– Family, home, common

objects

• May be difficult to comprehend beyond memorized material.

ACTFL Proficiency Scale Intermediate

Survival Proficiency• Has sufficient language to

create and express own meaning

• Engage in simple conversation• Deal with a simple social

transaction• Ask and answer questions• Comprehensible to a

sympathetic conversation partner

ACTFL Proficiency ScaleAdvanced

Limited Work Proficiency• Speaks with confidence• Can narrate and describe in

all major time frames• Can elaborate, clarify,

illustrate• Can handle a situation with a

complication • Can be a “Story Teller”• Fully comprehensible to

native speakers

ACTFL Proficiency ScaleSuperior

Professional Proficiency

• Can support opinions and hypothesize

• Converse both formally and informally

• Handle abstract treatment of subject

• No pattern of linguistic errors

ACTFL Criteria: Speaking

Quick Review

• 3 main types of Tests.– Ac…– Pe…– Pr…

A Summary that Contrasts: Achievement, Performance and Proficiency

AchievementMemorized,

Limited Transfer

PerformanceRehearsed, Near Transfer

ProficiencyUnrehearsed, Far Transfer

Repeat, produce, choose

Specific skills in familiar settings

A wide range of abilities

Context

Textbook, Curriculum

Focused, constrained, or restricted

Broad, in-depth, variable

Accuracy

Determined by the teacher

Situation dependent

Ascending expectations

Matching

Test Type with

Testing Purpose…

Some CommonTesting Purposes

• Assigning grades in a class.

• Placing students into a sequence of courses.

• Selecting an applicant for a job with limited, static language requirements.

• Screening employees for future jobs with broad, general language requirements.

What would happen if students,who were studying the same

textbook,were given achievement tests by

different teachers? • Unless the two tests asked exactly the

same questions, the students’ would have different responses on one test than on the other.

• Even if the questions were the same, unless the teachers graded using exactly the same criteria, each student’s score would be different.

What would happen if the same students were tested on their

rehearsed performance by University A and University B?

• Unless tests A and B covered exactly the same performance areas, the students’ performance on one test would be different from their performance on the other test.

• Even if the tests were identical, unless the raters from both Universities applied the same performance standards, the students would be given different ratings.

And what would happen if you compared students’ classroom achievement ratings with their performance ratings on a university test with their proficiency ratings? • Those who can pass an unrehearsed,

general proficiency test can also pass a performance test and an achievement test.

• Those who can pass an achievement test, or a rehearsed performance test may not be able to pass a general, unrehearsed proficiency test.

And what does all this mean?

All Three Types of LanguageTests are Needed.

3 Major Types of Tests• Achievement = Memorized responses

using the content of a specific textbook or curriculum.

• Performance = Rehearsed ability to communicate in constrained, familiar settings.

• Proficiency = Unrehearsed general ability to accomplish real-world communication tasks across a wide range of topics and settings.

Activity # 1

• You will be asked about 8 different testing purposes.

• For each of those test purposes, which type of test would you choose?

a. Achievementb. Performancec. Proficiency

Which type of test would you choose:Achievement, Performance, or Proficiency?

1. To assess students’ language learning after Chapter 3 of a beginning language course?

2. To place students into a university’s sequence of courses?

3. To test students completing a year-long, intensive language course?

4. To screen job applicants for a specific job with well-defined, repetitive tasks?

Which test type would you choose:Achievement, Performance, or Proficiency?

5. To select someone to be your spokesperson on a news show with a “hostile” moderator?

6. To document employees’ language ability in their personnel files?

7. To compare the learning of your students with those of other students using the same text book?

6. To document employees’ language ability in their personnel records?

7. To compare results of my students with those of other students using the same text book?

8. To compare the skills of students in Study Abroad programs with “regular” students?

Solving Testing Problems• “The solutions to our problems should be

as simple as possible, but no simpler.”

Albert Einstein

• There is no answer for the overly simple question of “Which test is best?”

• There is an answer to the question, “Which type of test is best for a given purpose?”

Which type of test is best?

• The test that matches the purpose for which the results will be used.– Use achievement tests for testing mastery of

lessons in a textbook.– Use performance tests for checking

rehearsed abilities within specific contexts.– Use proficiency tests for determining

general, unrehearsed ability in real-world situations.

If You Do Want to TestReading and Listening

Proficiency

• It is not as easy as you might think.

• Start by answering the question, “What is reading?”

A ProposedDefinition of Reading

• Reading: The process of deriving meaning from the written symbols used to represent a given language.

But What isReading Proficiency?

• Reading for achievement purposes may be defined differently for each curriculum.

• Reading for specific performance purposes can result in a different definition of reading for each purpose.

But What isReading Proficiency?

• “Proficient Reading” has some consistent, core expectations:– Understanding of texts for the purpose(s) for

which they were written.– Automatic comprehension rather than laborious

decoding.– Comprehension abilities that are sustained

beyond one’s own areas of specialization.

A Proposed Definition of Reading Proficiency

• Proficient reading: The active, automatic process of using one’s internalized language and culture expectancy system to obtain new information and comprehend authors’ views and communicative purposes from the written language symbols those authors have used to communicate their messages.

A Proposed Definition of Reading Proficiency

• Note: Proficient readers can “read to learn”.

A Summary of Receptive Skill Contrasts: Achievement, Performance and Proficiency

Achievement Performance Proficiency

Author’s purpose &

Reader’s task

Understand discreet pieces of learned content

(Learning to read)

Understand new information within familiar contexts

(Learning to read)

Understand new information about unfamiliar topics.

(Reading to learn)

Context

Textbook, Curriculum

Focused, restricted

Broad, in-depth, variable

Accuracy

Determined by the teacher

Situation dependent

Ascending expectations aligned with increasing task complexity.

Tests of Reading and Listening should follow the central

principles of proficiency testing.

• Does the test go beyond decoding?

• Are the tasks tested (questions asked) linked to specific proficiency levels?

• Do the ratings assigned represent a sustained ability across topical domains?

• Are the ratings based on non-compensatory task, domain, and accuracy criteria?

An Example: Total Score VersusCriterion-Referenced Scoring

(3 test takers with the same total score, but different proficiency levels)

Learner Results @

Level 1

Learner Results @

Level 2

Learner Results @

Level 3

Overall

ResultsTrue Level

Alice 65%

Bob 65%

Carol 65%

Criterion-Referenced Approach

• Report scores for each proficiency level separately.

• Check for “sustained” ability at each level.

• A notional reporting scale:– Sustained (consistent evidence) ≈ 70% to 100%– Developing (a lot; not sustained) ≈ 55% to 69%– Emerging (some evidence) ≈ 26% to 54%– Random (occasional evidence) ≈ 0% to 25%

Total Score VersusCriterion-Referenced Scoring

Learner Results @

Level 1

Learner Results @

Level 2

Learner Results @

Level 3

Overall

ResultsTrue Level

Alice 85%Sustained

70%Sustained

40%Emerging

65% Adv / 2

(Barely)

Bob 65%

Carol 65%

Learner Results @

Level 1

Learner Results @

Level 2

Learner Results @

Level 3

Overall

ResultsTrue Level

Alice 85%Sustained

70%Sustained

40%Emerging

65% Adv / 2

(Barely)

Bob 90%Sustained

85%Sustained

20%Random

65% Adv / 2

(Clearly)

Carol 65%

Learner Results @

Level 1

Learner Results @

Level 2

Learner Results @

Level 3

Overall

ResultsTrue Level

Alice 85%Sustained

70%Sustained

40%Emerging

65% Adv / 2

(Barely)

Bob 90%Sustained

85%Sustained

20%Random

65% Adv / 2

(Clearly)

Carol 90%Sustained

60%Developing

45%Emerging

65% Int Hi / 1+(1 with

developing ability @ 2)

Why Aren’t Criterion-Referenced Tests More Common?

• Traditional testing practices are predominately norm-referenced.

• There has been a lack of agreement on the construct to be tested.

• Descriptions of the receptive skills are quite complex.

For reading, how many rating profiles are possible?

• For 10 factors, each with 4 levels there are 40 cells in which a rating may be assigned.

• With one rating per factor, how many different profiles are possible?

10 Factors Author Reader

Rating Level

PurposeTopical

Domains GenreText Type Accuracy Purpose

Topical Domains

Type of Reading

Reading Strategy Accuracy

Superior

Advanced

Intermediate

Novice

With 10 different factors, how many rating profiles are possible?

10 factors with 4 levels produces 410 combinations …

or 1,048,576 possible profiles.

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x x x x

Intermediate x x x

Novice x x x

How might this unwieldy complexity be made more manageable?

• We can reduce the scoring complexity by aligning the rating factors!

• For instance, it would make sense to align the author “topical domains” with the author “purposes” generally associated with those topics.

For 9 factors, how manyrating profiles are possible?

or 262,144 possible profiles.

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x x x x

Intermediate x x

Novice x x x

How might this unwieldy complexity be made more manageable?

• Every instance of alignment across factors significantly simplifies the testing and rating process.

• For instance, it would make sense to align the author “genre” with the author “purposes” and “topical domains” generally associated with those genre.

8 factors Author Reader

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x x x x

Intermediate x

Novice x x x

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x x x

Intermediate x

Novice x x x

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x x x

Intermediate x

Novice x x

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x x

Intermediate x

Novice x x

or 256 possible profiles.

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x

Intermediate x

Novice x x

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x

Intermediate x

Novice x

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x

Intermediate

Novice x

For 1 factor per level, how manyrating profiles are possible?

1 factor with 4 levels produces 41 combinations …

Aligned Factors Author Reader

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced x

Intermediate

Novice

Benefits of Aligning Factors• Complexity is reduced.

• Each level becomes a separate “Task, Condition, and Accuracy” ability criterion.

• This hierarchy of levels establishes by-level criteria for measuring “reading proficiency”.

• With this ascending hierarchy of criteria, raters can look for sustained ability.

• Students’ abilities can be compared regardless of the textbook used or the program attended.

Proficiency Tests Are NotIncremental Progress

TestsAligned Factors Author Reader

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced

Intermediate

Novice

Proficiency Tests AreMilestone Tests

Aligned Factors Author Reader

Rating Level

PurposeTopical

Topical Domains

Type of Reading

Superior

Advanced

Intermediate

Novice

Activity # 2• Can you rank these 4 reading passages

from easiest to most difficult?

• Renumber the passages according to their relative difficulty.– 1 = Easiest– 2 = 2nd easiest– 3 = the 2nd most difficult– 4 = the most difficult

• Justify your ranking decisions.

Activity # 3• Align each of these 4 reading passages

with the proficiency levels summarized in the “text characteristics” handout?

• Justify your proposed alignment decisions.

Activity # 4• Write an “aligned” question for each text.

• Does the question you wrote require the test taker to read the text for the purpose for which the author wrote it?

Conclusion

• Proficiency tests are criterion-referenced tests.

• If criterion-referenced tests are well constructed, they can be scored based on the criteria they are designed to measure.

• Such criterion-referenced scoring does not require the testing of hundreds of test takers to be able to interpret the test results.

Add Handouts

• 4 texts Levels 0 – 3 … fair use?

• Overview of …

When What You Have Is Not Enough NCOLCTL 24 April 2009 Ray Clifford.

Documents