Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the...

Item Response Theory

Shortcomings of Classical True Score Model

• Sample dependence

• Limitation to the specific test situation.

• Dependence on the parallel forms

• Same error variance for all

Sample Dependence • The first shortcoming of CTS is that the values of

commonly used item statistics in test development such as item difficulty and item discrimination depend on the particular examinee samples in which they are obtained. The average level of ability and the range of ability scores in an examinee sample influence, often substantially, the values of the item statistics.

• Difficulty level changes with the level of sample’s ability and discrimination index is different between heterogeneous sample and the homogeneous sample.

Limitation to the Specific Test Situation

• The task of comparing examinees who have taken samples of test items of differing difficulty cannot easily be handled with standard testing models and procedures.

Dependence on the Parallel Forms

• The fundamental concept, test reliability, is defined in terms of parallel forms.

Same Error Variance For All

• CTS presumes that the variance of errors of measurement is the same for all examinees.

Item Response Theory

• The purpose of any test theory is to describe how inferences from examinee item responses and/or test scores can be made about unobservable examinee characteristics or traits that are measured by a test.

• An individual’s expected performance on a particular test question, or item, is a function of both the level of difficulty of the item and the individual’s level of ability.

Item Response Theory • Examinee performance on a test can be predicted

(or explained) by defining examinee characteristics, referred to as traits, or abilities; estimating scores for examinees on these traits (called "ability scores"); and using the scores to predict or explain item and test performance. Since traits are not directly measurable, they are referred to as latent traits or abilities. An item response model specifies a relationship between the observable examinee test performance and the unobservable traits or abilities assumed to underlie performance on the test.

Assumptions of IRT

• Unidimensionality

• Local independence

Unidimensionality Assumption• It is possible to estimate an examinee's ability on t

he same ability scale from any subset of items in the domain of items that have been fitted to the model. The domain of items needs to be homogeneous in the sense of measuring a single ability: If the domain of items is too heterogenous, the ability estimates will have little meaning.

• Most of the IRT models that are currently being applied make the specific assumption that the items in a test measure a single, or unidimensional ability or trait, and that the items form a unidimensional scale of measurement.

Local Independence

• This assumption states that an examinee's responses to different items in a test are statistically independent. For this assumption to be true, an examinee's performance on one item must not affect, either for better or for worse, his or her responses on any other items in the test.

Item Characteristic Curves

• Specific assumptions about the relationship between the test taker's ability and his performance on a given item are explicitly stated in the mathematical formula, or item characteristic curve (ICC).


• The form of the ICC is determined by the particular mathematical model on which it is based. The types of information about item characteristics may include:

• (1) the degree to which the item discriminates among individuals of differing levels of ability (the 'discrimination' parameter a);


• (2) the level of difficulty of the item (the 'difficulty' parameter b), and

• (3) the probability that an individual of low ability can answer the item correctly (the 'pseudo-chance' or 'guessing' parameter c).

• One of the major considerations in the application of IRT models, therefore, is the estimation of these item parameters.

ICC• pseudo-chance parameter

c: p=0.20 for two items• difficulty parameter b:

halfway between the pseudo-chance parameter and one

• discrimination parameter a: proportional to the slop of the ICC at the point of the difficulty parameter The steeper the slope, the greater the discrimination parameter.Ability Scale

Probability

Ability Score• 1. The test developer collects a set of observed ite

m responses from a relatively large number of test takers.

• 2. After an initial examination of how well various models fit the data, an IRT model is selected.

• 3. Through an iterative procedure, parameter estimates are assigned to items and ability scores to individuals, so as to maximize the agreement, or fit between the particular IRT model and the test data.

Ability Score

Item Information Function• The limitations on CTS theory approaches to

precision of measurement are addressed in the IRT concept of information function. The item information function refers to the amount of information a given item provides for estimating an individual's level of ability, and is a function of both the slope of the ICC and the amount of variation at each ability level.

• The information function of a given item will be at its maximum for individuals whose ability is at or near the value of the difficulty parameter.

Item Information Function

Item Information Function

Item Information Function• The information function of a given item will be at

its maximum for individuals whose ability is at or near the value of the difficulty parameter.

• (1) provides the most information about differences in ability at the lower end of the ability scale.

• (2) provides relatively little information at any point on the ability scale.

• (3) provides the most information about differences in ability at the high end of the ability scale.

Test Information Function

• The test information function (TIF) is the sum of the item information functions, each of which contributes independently to the total, and is a measure of how much information a test provides at different ability levels.

• The TIF is the IRT analog of CTS theory reliability and the standard error of measurement.

Item Bank • If there is a need for regular test administration and

analysis, the construction of item bank may be taken into consideration.

• Item bank is not a simple collection of test items that is organized in their raw form, but with parameters assigned on the basis of CTS or IRT models.

• Item bank should also have a data processing system that assures the steady quality of the data in the bank (describing, classifying, accepting, and rejecting items)

Specifications in CTS Item Bank

• Form of items

• Type of item parts

• Describing data

• Classifying data

Form of Items• Dichotomous

Listening comprehension Statement + question + choices Short conversation +question + choices Long conversation / passage + some questions + choices Reading comprehension

Passage + some questions + choices Passage + T/F questions Syntactic knowledge / vocabulary

Question stem with blank/underlined parts + choices Cloze Passage + choices

Form of Items

• Nondichotomous

Listening comprehension

Dictation

Dictation passage with blanks to be filled

Describing data

• Ability measured

• Difficulty index

• Discrimination

• Storage code

Date post:	21-Dec-2015
Category:	Documents
View:	216 times
Download:	0 times

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the...

Documents