+ All Categories
Home > Documents > Lecture 25

Lecture 25

Date post: 05-Nov-2015
Category:
Upload: kashif-waqas
View: 1 times
Download: 0 times
Share this document with a friend
Description:
free
22
LANGUAGE PROFICIENCY TESTING Lecture # 25
Transcript
  • LANGUAGE PROFICIENCYTESTINGLecture # 25

  • Review of the LectureAssessment principlesTesting in assessmentTypes of testsFrequently used test formats

  • QuotationPlease God may I not failPlease God may I get over sixty per centPlease God may I get a high placePlease God may all those likely to beatme get killed in road accidents and may they die roaring. Irish novelist McGahern

  • Important points to be discussedTypes of language testsWays of describing tests Evaluating the usefulness of language testsOverview of common language tests: TOEFL, TOEIC, IELTS, and CAELImpact of testing on learning and teachingCritical use of language testsTesting Questions

  • Types of Language TestsAchievement test associated with process of instruction assesses where progress has been made should support the teaching to which it relatesAlternative Assessmentneed for assessment to be integrated with the goals of the curriculumlearners are engaged in self-assessment

  • Proficiency testaims to establish a test takers readiness for a particular communicative rolegeneral measure of language ability measures a relatively stable traitused to make predictions about future language performance (Hamp-Lyons, 1998)high-stakes test

  • Some ways of describing tests Objective Subjective Indirect Direct Discrete-point Integrative Aptitude/ Achievement/ Proficiency Performance External Internal Norm-Referenced Criterion-Referenced

  • Evaluating the usefulness of a language testUsefulness= reliability+validity+ impact authenticity+interactiveness+practicality (Bachman and Palmer, 1996)

    TESTUSEFULNESSRELIABILITYVALIDITYImpactAuthenticityPracticalityInteractiveness

  • Evaluating the usefulness of a language testEssential measurement qualitiesreliabilityconstruct validity :Construct validityis the degree to which a test measures what it claims, or purports, to be measuring Evaluation: test taker - test task - Target Language Use (TLU)TLUTest TaskTest Taker

  • Overview of common language proficiency testsTOEFLTOEICIELTSCAEL ETS, USUKCDN

  • Test of English as a Foreign LanguageOne million test takers per yearThree sections:Listening Structure and Written ExpressionReading ComprehensionTWE : test of written English

  • Test of English as a Foreign LanguageObjectiveSubjectiveDiscrete-pointIntegrativeProficiency Achievementdiscord between test and understanding of language and communicationpassive recognition of languagecutoff scores are very problematicgeneral proficiency academic proficiency

  • Test of English for International CommunicationTOEFL equivalent for workplace settingtwo sections, 200 q.listeningreadingentertainment, manufacturing, health, travel, finance, etc.objective and cost-efficient

  • Test of English for International Communication

    ObjectiveSubjectiveDiscrete-pointIntegrativeProficiency Achievementlack of correspondence with TLU narrow constructtest content is extremely broad

  • International English Language Testing SystemAcademic/GeneralResults reported in band scores 1-9ListeningG.ReadingA.ReadingG.WritingA.WritingSpeaking

  • International English Language Testing System

    ObjectiveSubjectiveDiscrete-pointIntegrativeProficiency Achievementtest tasks reflective of academic tasksscore reporting is diagnosticneed for reliability research

  • Canadian Academic English Language AssessmentMirrors language use in universityTopic-based,integrated reading, listening, and writing tasksprovides specific diagnostic information scores are reported in bands 10-90

  • Canadian Academic English Language AssessmentObjectiveSubjectiveDiscrete-pointIntegrativeProficiency Achievementtests performance and use diminished gap between test and classroomvalidity is supported by teacher evaluationsstudies on predicting academic success

  • Washback: The Impact of Tests on Teaching and LearningThe power of tests has a strong influence on curriculum and learning outcomes (Shohamy, 1993)good test positive washbackform of test impact depends onantecedent: educational context and conditionprocessconsequences (Wall, 2000)

  • Critical Language TestingFocus on consequence and ethics of test useTests are embedded in cultural, educational, and political arenaswhose agenda?Questions traditional testing knowledgeEnglish proficiency= academic success?English: got it or get it!Responsible test use (Hamp-Lyons, 2000)

  • Testing QuestionsWhat is actually being tested by the test we are using?What is thebest test to use?What relevant information does the test provide?How is testing affecting teaching and learning behaviour?Is language testing fair?

  • summaryTypes of language testsWays of describing tests Evaluating the usefulness of language testsOverview of common language tests: TOEFL, TOEIC, IELTS, and CAELImpact of testing on learning and teachingCritical use of language testsTesting Questions

    I would like to begin todays presentation with a quote which, taken to an extreme, illustrates the effect that high stakes testing can have on students.REFER AUDIENCE TO HANDOUT1. examples: end of course tests, portfolio assessments2. We accumulate evidence during, or at the end of a course of study in order to see whether and where progress has been made in terms of the goals of learning3. designed to measure how much of a syllabus a learner has mastered and thus they are only valid to the extent to which the content of the test matches the content of the syllabus4. The use of achievement tests allows instructors to be innovative and to reflect progressive aspects of the curriculum = they are thus associated with some interesting new developments , a movement known as alternative assessment5. this approach stresses the need for assessment to be integrated with the goals of the curriculum-learners may be encouraged to share responsibility in assessment and be trained to evaluate their own capacities -known as self-assessmentRefer to Brown and Hudson for a detailed discussion of alternative assessment1. This is established for university admission, professional certification, workplace etc.2. language ability - consequently not reflective of a specific syllabus3. stable trait- this means that scores tend not to change within a short period of time; thus this type of test would not be useful in the context of assessing learning over a few weeks.-indeed this change would mainly indicate statistical variance-however, programs are often pressured to employ such tests in order to determine the effectiveness of teaching4. predictions - this is why such tests are used for admissions decisions and consequently are high stakes - they determine in great part a students academic and economic future- Interestingly Hamp-Lyons notes that the vast majority of people who interpret test scores are neither teachers nor testing professionals, they are administrators.Objective= no human interference, very highly reliablesubjective=individuals are involved in the evaluation processindirect= we make inferences from the test tasks- e.g. using a sentence structure question to infer the writing ability of a test takerdirect= no gap between test task and target language situation . E.g assessing speaking skills in an interviewdiscrete-point=multiple-choice, often isolated itemsintegrative=different skills are not separated but assessed holisticallyexternalinternalNorm-referenced= a test takers performance is evaluated against the range of performances typical of a population of similar test takersCriterion-referenced=performances are compared to one or more descriptions of adequate performance at a given level e.g. band scoresDescribing and evaluating tests on a continuum allows us to steer away from a black and white judgment.

    REFER AUDIENCE TO HANDOUT1. In order to determine which test is best for a given assessment situation, we need to evaluate its overall usefulness.2. Bachman and Palmer include six qualities in their definition of usefulness :listreliability= consistency of measurementvalidity= the extent to which the inferences that we make on the basis of the test are valid given the target language use situationauthenticity= how closely does the test resemble the actual language use situationinteractiveness= to what extent is the test taker involved in active communicationimpact= what is the effect of the test on test takers, test users, teachers etc.3. These qualities are not all granted equal regard but they must all be considered in order to achieve a desired balance - consequently the balance would vary from one testing situation to another.these elements cannot be evaluated independently but must be looked at in terms of their combined effect - OVERALL usefulness that needs to be emphazised rather than ind Qualities.- evaluation of test usefulness is essentially subjective because it is based on judgements on part of test userREFER AUDIENCE TO QUESTIONS FOR EVALUATION ON HANDOUT

    1. Two essential considerations in the evaluation of test usefulness are reliability and validity2. Reliability is necessary because we want to ensure that test results are scored in a reliable and consistent manner. However, strong reliability without validity tells us essentially nothing.3. Therefore, construct validity is of specific interest to us is because it is concerned with the extent to which we can interpret a given test score as an indicator of the ability we want to measure - thus, it addresses the meaningfulness and appropriateness of the interpretations that we make.4. Threats to construct validity can occur when real requirements of the TLU domain may be not be fully represented in the test. We frequently hear people complain that even though students perform very high on the TOEFL they lack basic communication skills. This is probably the case because interaction is not required by the test. The TSE is sometimes employed to remedy this fact; however describing how a tourist can find the way to the train station will not necessarily translate into the ablity to take part in round-table discussionsthreats to content validity: issue is to what extent the test content forms a satisfactor basis for the inferences to be made from performance e.g. using the TOEFL to make inferences about the ability of an international student to act as a teaching assistantif we want to use the scores from a language test to make inferences about individuals language ability, and possibly make various types of decisions, we must be able to demonstrate how performance on that language test is related to language use in specific situations other than the language test itselfthat is why when considering the six qualities just addressed we always need to examine them in connection to the test taker, the test task and the Target Language Use - Ideally there should be a seamless connection between these three elements- the greater the distance the less useful the inferences that we can make.

    1. The greatest language test prep industry has developed around this testintroduction to test prep book states you are well aware that the TOEFL is one of the most important examinations that you will ever take. Your entire future may well depend on your performance in the TOEFL. The results of this test will determine whether you will be admitted to the school of your choice.2. 1 million test takers3. the TOEFL is 100% multiple choice-it uses generic, or neutral language and does not specify a context4. Four sections- Listening section: test takers are not given opportunity to preview questions, nor to see them while listening, nor take notes5. Research at TOEFL places heavy emphasis on reliability but provides inadequate validity evidence.New development include automatic essay scoring that is done by computer analysis of written structures- TOEFL 2000 project that aims to make changes to the construct of the test which dates back to the 1960s.

    1. Does not reflect current teaching and learning practices and could thus have negative effects on students, teachers because it is in conflict.2.Passive reconition Students who pass the test are often unable to communicate However, institutions and other TOEFL score recipients that note inconsistencies such as high TOEFL scores and apparent weak English proficiency, should refer to the photo on the Official Score Report for evidence of impersonation3. Cutoff scores CPA called upon Canadian universities to refrain from using TOEFL as a standard for university admission - contrary to recommendations decisions often based solely on score- interpretation of scores is difficult because it is norm-referenced and simply provides a number-many have increased have increased TOEFL cutoffs ranging from 580-600-many who would otherwise be qualified for university admission are denied access- after an 8week summer university orientation program given in English, students scores on the TOEFL itself increased from an average of 570-601-mean score of native speakers reported by ETS is 5904. General proficiency In his critique of language tests and admission procedures, Elson quoted several studies that have found that merely knowing how a student scored on TOEFL will tell us practically nothing we need to know to predict the students academic performance5. dissatisfaction has led to disuse of TOEFL by some e.g. Australia-misuse of the TOEFL, cycles of raising and lowering requirements-TOEFL is used as an initial screen but other tests have to be taken upon arrival1. listening section includes variety of statements, questions, short conversations2. reading section includes incomplete sentences, error recognition, and reading comprehension3. Content is drawn from a wide variety of areas4. tailored to provide rapid, affordable, and convenient service; therefore only measure listening and reading since these can be tested objectively. Testing writing and speaking requires time and expense and are less objective and less reliable

    1.Concern with lack of correspondence between test tasks and target language use. Does not measure speaking - how do you know that person will be able to communicate in a business setting?2. It only measures listening and reading but makes inferences to communicative ability3. the test content is extremely broad and may in the end not provide any useful information to any of the fields that use this test

    1. 205 test centers in over 100 countries2. Test is divided into four modules, which have no central theme or topic but offer separate reading and writing tasks for either general or academic English use3. listening: number of recorded texts which increase in difficulty as the test progresses, mixture of conversations and dialogues - allowed to preview4. readings are taken from books, magazines, journals5. writing includes two tasks 1. Write a 150 word report based on material found in a table or diagram, demonstrating ability to describe and explain.- Short essay of 250 words in response to an opinion or a problemexpected to demonstrate ability to discuss issues, construct an argument, and use appropriate tone and register6. Speaking is assessed during a 10-15min one-on-one interview. Requires the test taker to describe, narrate, and provide explanations on a variety of personal and general interest topics- objective key for listening and reading components, speaking and writing components are marked on a subjective key7. The test includes a variety of task and response types1. The actual tasks are reflective of academic tasks2. Comprehensive scoring structure has advantage of giving students knowledge of what specific area of language needs special attention- when asked whether the subjective component of the assessment procedure might introduce a degree of unfairness into the testing process, Jill Richardson said that if the test is truly to be regarded as a communication oriented process, personal interaction is a necessary ingredient without which it is difficult to truly establish a persons capacity to use language3.need for more reliability research.-emphasis for UCLES has been on validity and this is also reflected in their certificate exams. It comes from a tradition where teaching professionals are trusted to make fair judgements.4. It is one of the two tests accepted by the Canadian government for immigration purposes.

    1. Was designed by Carleton U. in response to their perceived failure of standardized tests to effectively identify students who were able to use English at levels required for university study2. test is grounded in day-to day use of language within first year courses at the university-this test is designed not for the global knowledge of English but for English-medium academic contexts-attempts to recreate for the test taker the experience of joining an introductory first year course3.Integrated, criterion-referenced, topic-based test for EAP-uses constructed response rather than multiple-choice items-there is direct overlap between taking a CAEL assessment, taking and academically oriented ESL course or taking a first year course at a university. The overlap is clear in the tasks and activities of the test -in this way the test aims to promote positive and useful learning- When completing practice tests students are provided with a conversion key that states which skill is tested by each question1. The nature of the test tasks encourage students to make use of their language knowledge and actively engages them2. The language skills that are promoted by the test are in line with what a teacher would use in an EAP classroom3. Research has shown that teachers evaluate their students in-class performances similarly4. There is an ongoing tracking study that aims to link test performance with future academic performance5. Even though the test was designed to create positive washback for language learners and teachers; some students have reportedly the same studying habits as for the TOEFL: staying at home for independent cramming. Demonstrates that a positive test does not have the same impact on all students. 1. Bailey there is a natural tendency for both teachers and students to tailor their classroom activities to the demands of the test, especially when the test is very important to the future of the students2. washback can be either positive or negative to the extent that it promotes or hinders achievement of language learning goals held by learners and educators3. Complex interaction of factors.

    4. The more information is available to teachers, learners, test users, and the more they are involved in the testing process, the more likely we will be creating positive impact-considering that proficiency tests are most powerful indicator for determining the academic future of ESL students discussion needs to start focusing on ethics and consequences of test use- Shohamy introduced the concept of critical language testing- this concept builds on critical pedagogy perspective and emphasizes that the act of testing is both a product and agent of cultural, social, and political agendas- consequently the notion of just a test does not exist -what sort of vision of society does the test create? Question puts at center the responsibility that test users carry with regard to consequences of test use-need to examine the extent to which test agendas reflect the interest of the field of language teaching and learning- it calls into question traditional testing knowledge that views numbers as symbols of objectivity and truth- these numbers are powerful not only because those who use them consider them truthful but also because they allow classification, quantification and judgement. Success and failure are determined by arbitrary cutting scores and all test takers are judged according to the same yardstick-research to suggest that academic achievement in selected disciplines is hardly affected by degree of English language proficiency- how much do we actually know about the degree of English facility that is required for successful completion? Test developers and experts cannot agree what indeed the tests measure and they do not have a clear sense of We must accept responsibility for all the consequences that we are aware of.

    1. there is what the receiving institution wants to know from a test- there is also what the test actually tests, these interests are not necessarily compatible2. There is no best test. We need to consider all variables to make app. choice 3. different tests produce different information. What connection is there between test items that measure surface structure recognition and the ability to be a successful student? If a test is isolated from the reality that the student will experience as a learner, it becomes accordingly less relevant4. Impact= many of us may have encountered the answer to this question in our classrooms, when students demand to be taught to the test5. language testing is used as a basis for refusing or admitting a student and thus shifts responsibility away from the institution itself. If the student meets the admission requirements to which native speakers are subject, then they should be admitted on the same basis. The provision of opportunities to continue developing English facility is part of commitment to learning6. AERA standards state that test developers should provide information on the strengths and the weaknesses of their instruments. However, the ultimate responsibility for appropriate test use and interpretation lies predominantly with the test user.7. I hope that this brief overview of language proficiency testing will lead to further reflection on language testing and that these testing questions remain with us.


Recommended