+ All Categories
Home > Documents > 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Date post: 18-Jan-2016
Category:
Upload: arron-stevens
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
113
1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut
Transcript
Page 1: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

1

LINKING AND EQUATING OF TEST SCORES

Hariharan SwaminathanUniversity of Connecticut

Page 2: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

AN EXAMPLE

CONVERTING TEMPERATURE FROM ONE SCALE TO ANOTHER

CELSIUS TO FAHRENHEIT

F = C * (9/5) + 32

THIS IS A LINEAR MAPPING OF ONE SCORE ON TO THAT OF ANOTHER

Page 3: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

AN EXAMPLE OF EQUIVALENCE OF MEASUREMENTS

MEASURING BLOOD SUGAR

Equivalence of Lab Measurement and Home Measurement

How do we do this?

1. Measure Blood Sugar in the Lab2. Measure Blood Sugar Using Home Instrument3. Plot one value against the other

Page 4: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

AN EXAMPLE OF EQUIVALENCE OF MEASUREMENTS

92 94 96 98 100 102 104 10692

94

96

98

100

102

104

106

108

110

112

BLOOD SUGAR MEASUREMENTSHome vs Lab

HOME

LAB

LAB = 1.021*HOME + 4.554

Page 5: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

AN EXAMPLE OF EQUIVALENCE OF MEASUREMENTS

COST OF LIVING COST OF GOODS IN 2015 COMPARED WITH COST IN 1975

How do we do this?

1.Record costs of comparable goods (Market Basket)2.Find cost rati o (Economist ’s procedure)or3. Plot one value against the other

Page 6: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

AN EXAMPLE OF EQUIVALENCE OF MEASUREMENTS

0 5 10 15 20 25 30 35 40 450

50

100

150

200

250

300

350

COST NOW COMPARED WITH COST IN 70

Cost1970

Cost

Now

COST NOW = 7.35*COST 70 + 7.175

Page 7: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

• The above are examples of relati ng two measurement scales to each other.

• In the fi rst example, we realize that one scale bears a simple relati on to the other. We are just changing the units of measurement.

• In the second, we are using two diff erent instruments to measure the same enti ty, and using predicti on to establish equivalence.

• In the third, we are trying to relate two measurements, under the assumpti on that the objects we measure remain the same.

• We are “equati ng” in all these cases.

EQUATING

Page 8: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

• Multi ple forms of a test are necessary in many testi ng situati ons to ensure the validity of scores.

• Examples: Many large scale testing programs have several

administration dates in a given year. Students may be permitted multiple attempts on an

exam. To assess growth or change, an individual must be

assessed on several occasions over time. Many states are required to release some or all test

items after administration and hence require new forms.

THE CONTEXT OF EQUATING

Page 9: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

• In each case, the validity of the scores would be threatened by using the same test.

• Using the same test across administrati ons at diff erent ti mes would result in exposure of the test items and would mean that students who were tested later had an advantage.

• Using the same test across ti me for the same students would confound practi ce or memory eff ects with true growth or change.

THE CONTEXT OF EQUATING

Page 10: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

• To avoid these problems, diff erent test forms are constructed to the same specifi cati ons.

• The diff erent test forms are intended to measure the same skills and be of similar diffi culty and discriminati on.

• Tests that measure the same construct, have the same mean, variance, reliability, and correlati ons with other tests are said to be “parallel”.

THE CONTEXT OF EQUATING

Page 11: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

• If the test forms were truly parallel, equati ng would not be necessary.

• However, test forms will always diff er somewhat, and it is necessary to take this into account somehow before scores on diff erent forms can be compared.

THE CONTEXT OF EQUATING

Page 12: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

• Successful equati ng will result in scores that can be validly compared even though they were obtained on diff erent test forms.

• The overarching principle of equating is that it should be equitable for all test-takers: after equating, it should not matter to individuals whether they take Form X or Form Y.

THE CONTEXT OF EQUATING

Page 13: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Equating procedures are designed to produce scores that can be validly compared across individuals even when they took different versions of a test (different “test forms”).

Linking, Scale aligning, or simply Scaling, involves transforming the scores from different tests onto a common scale. These procedures produce scores that are expressed on a common scale but the scores of individuals who took different tests cannot be compared in the strict sense.

EQUATING AND LINKING

Page 14: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Linking is used for establishing the relationship between two tests that are not built to the same content specifications or to have the same diffi culty.

One common example is two establish a relationship between the scores obtained on SAT and ACT, two college entrance examinations.

The ACT and SAT are measures of the same general constructs (aptitude). Concordance tables are constructed to express the relationship between the two scores; they are however, not exchangeable.

LINKING VS. EQUATING

Page 15: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Similarly, in India, the relationship between scores obtained on secondary school certificate examinations conducted by two boards must be established through linking.

While these tests are not exchangeable, the linking provides some degree of comparability.

If all students take a common college entrance examination, the board scores can be linked through this common examination. This linking will be a litt le stronger.

LINKING VS. EQUATING

Page 16: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Equating is one of the core applications of measurement theory: it is central to any testing program that uses multiple test administrations.

There are two different measurement frameworks in common use today; equating is approached differently in these frameworks

The older framework (c. 1900s) is now referred to as Classical Test Theory (CTT)

The newer (c. 1970s) framework is Item Response Theory (IRT)

MEASUREMENT FRAMEWORKS FOR EQUATING

Page 17: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Classical Test Theory is based on number-correct (raw) test scores.

Equating under CTT involves creating a table for converting raw scores on one test to the scale of raw scores on the other test. The equated scores may then be linearly transformed to a desired reporting scale.

Equati ng in a classical framework lacks a strong theoreti cal basis.

MEASUREMENT FRAMEWORKS FOR EQUATING

Page 18: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

IRT provides a strong theoretical basis for equating.

IRT provides mathematical models that relate an individual’s performance on a test item to characteristics of the item and the individual’s value on the underlying latent trait measured by the test.

Instead of using raw score as a measure of performance, we use the estimated trait value for each individual.

Equating is a simpler matter under IRT than CTT.

MEASUREMENT FRAMEWORKS FOR EQUATING

Page 19: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

However, IRT requires strong assumptions and large sample sizes for estimating the parameters of the model.

CTT equating procedures remain in wide use in situations where the assumptions of IRT are not met, where testing populations are small, or where raw scores are the preferred scores.

MEASUREMENT FRAMEWORKS FOR EQUATING

Page 20: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

• Within the classical test theory framework, equating can only be achieved under certain conditions.

1. The tests must measure the same construct.

Scores that measure diff erent constructs can never be equated - we could not fairly compare the mathemati cs score of one student with the reading score of another!

REQUIREMENTS FOR CLASSICAL EQUATING

Page 21: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Within the classical framework, equating can only be achieved under certain conditions:

2. The tests should be equally reliable .

If the test forms are unequally reliable, higher performing students would benefi t from the more reliable test, and lower performing students

would benefi t from the less reliable test.

REQUIREMENTS FOR CLASSICAL EQUATING

Page 22: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Within the classical framework, equating can only be achieved under certain conditions:

3. The conversion should be symmetric: it should result in the same equated scores regardless of whether Form Y scores are converted to the scale of Form X or Form X scores are converted to the scale of Form Y, i.e., the equati ng should produce the same result in both directi ons.

For example, if a score of 28 on Form Y equates to a score of 26 on Form X, then a score of 26 on Form X should equate to a score of 28 on Form Y.

REQUIREMENTS FOR CLASSICAL EQUATING

Page 23: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Within the classical framework, equating can only be achieved under certain conditions:

4. The equati ng functi on should be invariant across subpopulati ons of students at diff erent levels of performance.

The distributi on of performance in the groups taking the diff erent test forms should not aff ect the equati ng functi on; if it does, then the equati ng may not be equitable to groups of students with a diff erent distributi on of performance.

REQUIREMENTS FOR CLASSICAL EQUATING

Page 24: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Fully meeti ng these requirements is almost impossible in practi ce (parti cularly the requirements of equal reliability and invariance across all subpopulati ons).

Nevertheless, some adjustment of scores from diff erent test forms must be made to compensate for lack of equivalence of test forms.

We keep the requirements for equati ng in mind to ensure that we do the best possible job.

REQUIREMENTS FOR CLASSICAL EQUATING

Page 25: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Linking of test scores can only be achieved if appropriate data are collected to establish the relati onship between the score scales.

There are several possible data collecti on designs:1. Single-group (SG) design2. Equivalent-groups (EG) design3. Non-equivalent anchor test (NEAT) design

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 26: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Single-Group Design:

• One group of test-takers takes both test forms.

• This is the ideal design, as we know that any diff erence in the distributi ons of scores is due to diff erences in diffi culty and discriminati on of the test forms.

• We then adjust the scores on one form to make the two distributi ons equal.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Population Sample Form X Form YP 1 * *

Page 27: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Single-Group Design:

• Advantages: There is no confounding of test diff erences with group diff erences.

Only one group of test-takers is needed.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 28: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Single-Group Design:

• Disadvantages: Practi ce and fati gue eff ects may aff ect scores.

These eff ects can be controlled by counterbalancing the order of administrati on of the forms, but this generally cannot be incorporated into an operati onal administrati on; it requires a special

equati ng study.

Extended testi ng ti me is required (may not be feasible).

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 29: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Equivalent-Groups Design:

• Two equivalent groups take one test form each.

• This design is an approximati on to the single-group design.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Population Sample Form X Form YP 1 *P 2 *

Page 30: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Equivalent-Groups Design:

• Advantages: There is no confounding of test diffi culty with group diff erences.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 31: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Equivalent-Groups Design:

• Disadvantages: Large samples are required to ensure

stati sti cal equivalence.

If the base form has been previously administered, it must be re-administered at the same ti me as the new form to ensure equivalent samples; this poses a test security risk.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 32: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Non-equivalent Anchor Test (NEAT) Design:

• Two diff erent groups take one test each; both groups take a common set of items (anchor test A)

• The anchor test controls for group diff erences.

• Individuals taking Form Y with the same anchor test score as individuals taking Form X should have the same score on Form X.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Population Sample Form X Anchor Form YP 1 * *Q 2 * *

Page 33: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Non-equivalent Anchor Test (NEAT) Design:

• Advantages:

The groups do not need to be equivalent.

A new form can be equated to a previously administered form without re-administering the old form.

Students take only one form of the test plus the anchor test.

The design is easily incorporated into operati onal test administrati ons.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 34: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Non-equivalent Anchor Test (NEAT) Design:

• Disadvantages:

The anchor test must be of suffi cient length and appropriate nature.

The anchor test items must behave in the same way for the two diff erent groups (be equally diffi cult).

If equati ng to a previously administered test, the anchor test items have also been previously administered and therefore may be known to test- takers.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 35: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

The NEAT design is the most commonly used in practi ce.

The purpose of the anchor is to determine to what extent score diff erences on the test forms are due to ability or profi ciency diff erences in the two groups.

Diff erences in test diffi culty can then be separated from group diff erences.

Equati ng should adjust scores only for diff erences in test diffi culty.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 36: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Anchor tests may be external (anchor items not included in score) or internal (anchor items included in score).

External anchors are oft en administered as separately ti med secti ons.

Internal anchor items are spread throughout the test.

Both types of anchor are suscepti ble to exposure; internal anchors are suscepti ble to context eff ects.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 37: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

In testi ng programs where scored items must be released aft er administrati on, an external anchor makes sense.

For Classical equati ng methods, the anchor test should be a “mini-test” in compositi on.

Longer anchors will give bett er results.

DATA COLLECTION DESIGNS FOR TEST SCORE LINKING

Page 38: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

THE EQUATING PROBLEM

Mean = 10.4; SD = 3.9 Mean = 8.4; SD = 4.4

Suppose we administer two test forms X and Y to a group of students and obtain the score distributions shown below:

0 2 4 6 8 10 12 14 16 18 200123456789

1011

Test Score

% o

f Stu

dent

s

Form X

0 2 4 6 8 10 12 14 16 18 200123456789

1011

Test Score

% o

f Stu

dent

s

Form Y

Page 39: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Form Y is harder than Form X (the average score is lower on Form Y).

On average, students scored about two points lower on Form Y than Form X.

A student with a score on 10 on Form Y would probably have obtained a higher score if he or she had taken Form X.

Why not just add 2 points to the score of everyone who took Form Y? Then the average scores would be about the same.

THE EQUATING PROBLEM

Page 40: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

But suppose the two tests were made up of items with diffi culti es like those shown below:

THE EQUATING PROBLEM

Item Difficulty Number of Items Form X Form Y

Very hard 1 1Hard 6 5Medium 7 11Easy 4 2Very easy 2 1

Page 41: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

For low ability students, Form Y is harder because there are fewer easy questi ons, so they should get points added to their Form Y score.

THE EQUATING PROBLEM

Item Difficulty Number of Items Form X Form Y

Very hard 1 1Hard 6 5Medium 7 11Easy 4 2Very easy 2 1

Page 42: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

For high ability students, Form Y is slightly easier because there are fewer hard questi ons, so they should have points subtracted from their Form Y score.

THE EQUATING PROBLEM

Item Difficulty Number of Items Form X Form Y

Very hard 1 1Hard 6 5Medium 7 11Easy 4 2Very easy 2 1

Page 43: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Adding or subtracti ng the same number of points for every student would not be equitable*.

We need an adjustment that is diff erent for students at diff erent levels of ability.

*

CLASSICAL EQUATING PROCEDURES

Page 44: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Why not do a regression analysis to predict the scores on Form X from the scores on Form Y?

CLASSICAL EQUATING PROCEDURES

Prediction equation is X = 4.142 + .751X

A student with a score on 10 on Form Y is predicted to have a Form X score of 4.142 + .751(10) = 11.65

Page 45: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

For the conversion to be symmetric, a student with a (theoreti cal) score of 11.65 should get a predicted score of 10 on Form Y.

CLASSICAL EQUATING PROCEDURES

Prediction equation is X = -1.473 + .944X

A student with a score on 12.44 on Form X is predicted to have a Form Y score of -1.473 + .944(11.65) = 9.52

Page 46: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

The regression procedure is not symmetric: we get diff erent equated scores depending on which test we use as the base form.

CLASSICAL EQUATING PROCEDURES

Page 47: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

We can solve these problems in one of two ways.

There are two general approaches to equati ng test scores using classical procedures:• Linear methods• Equipercentile methods

Both require scores of the same or equivalent groups on the two tests (but can also be used with non-equivalent groups after some fancy footwork).

CLASSICAL EQUATING PROCEDURES

Page 48: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Under linear equati ng, scores on the two forms are considered equivalent if they are the same distance from the mean of the form in standard deviati on units (i.e., have the same z-score).

Hence, there is a linear relati onship between the two sets of scores.

LINEAR EQUATING

Page 49: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Scores x (on Form X) and y (on Form Y) are considered equivalent if they have the same z-score:

where are the mean and standard deviati on of scores on Form X and are the mean and

standard deviati on of scores on Form Y.

LINEAR EQUATING

X Y

X Y

x y

, X X

, Y Y

Page 50: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Thus,

Given sample data, the transformati on required for rescaling Form Y scores to the scale of Form X:

The transformati on for rescaling Form Y scores to the scale of Form X is

LINEAR EQUATING

( )

XY X

Y

x y

* ( )X Xx y

Y Y

y y

Page 51: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Aft er equati ng, the rescaled Form Y scores have the same mean and standard deviati on as the Form X scores.

Note that this is not a predicti on (regression) equati on; it can be inverted to obtain the transformati on for rescaling Form X scores to the scale of Form Y:

LINEAR EQUATING

* ( )y yy x

x x

x x

Page 52: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

If we substi tute the transformed Form Y score y*(now on the Form X scale) for x in the equati on, y* converts back to the original y value:

LINEAR EQUATING

* [ ( ) ] ( )Y X X Y

X Y Y X

s s s sx y X Y Y X y

s s s s

Page 53: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Rescaling usually results in non-integer scores; these are generally rounded to integer values.

Under Linear equati ng, it is possible to obtain rescaled scores outside the permissible range.

Depending on the fi nal reporti ng scale, these are oft en truncated to the lowest and highest permissible scores (arbitrarily decided).

LINEAR EQUATING

Page 54: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

For the test score distributi ons shown earlier,

Mean on Form X: 10.429

SD of Form X: 3.918

Mean of Form Y: 8.375

SD of Form Y: 4.394

LINEAR EQUATING EXAMPLE

3.918 3.918* [10.429 ( ) 8.375]

4.394 4.394.892 2.961

y y

y

Page 55: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Score on Form Y

Equated Score on Form X

Rounded Equated Score

0 2.96 31 3.85 42 4.75 53 5.64 64 6.53 75 7.42 76 8.31 87 9.21 98 10.10 109 10.99 11

10 11.88 1211 12.77 1312 13.67 1413 14.56 1514 15.45 1515 16.34 1616 17.23 1717 18.13 1818 19.02 1919 19.91 2020 20.80 20

LINEAR EQUATING EXAMPLE

A score of 3 on Form Y is equivalent to a score of 6 on Form X

A score of 9 on Form Y is equivalent to a score of 11 on Form X

Note that we have to truncate the equated scores at 20

Page 56: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Score on Form Y

Equated Score on Form X

Rounded Equated Score

0 2.96 31 3.85 42 4.75 53 5.64 64 6.53 75 7.42 76 8.31 87 9.21 98 10.10 109 10.99 11

10 11.88 1211 12.77 1312 13.67 1413 14.56 1514 15.45 1515 16.34 1616 17.23 1717 18.13 1818 19.02 1919 19.91 2020 20.80 20

LINEAR EQUATING EXAMPLE

Note that we have added more points to scores at the low end of the scale than at the top end

Page 57: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Linear equati ng matches the means and standard deviati ons of the two distributi ons but it does not change the shape of the score distributi on.

A Form Y score with the same z-score as a Form X score may have a diff erent percenti le rank.

EQUIPERCENTILE EQUATING

Page 58: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Under equipercenti le equati ng, scores on two forms are considered to be equivalent if they have the same percenti le rank in the target populati on.

Aft er equati ng, the equated scores have the same distributi on as the scores on the base form.

EQUIPERCENTILE EQUATING

Page 59: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Scores on the two forms with equal percenti le ranks are plott ed against each other.

The curve is usually irregular; smoothing procedures are applied under the assumpti on that the irregularity is due to sampling error.

Smoothing can be applied to each cumulati ve distributi on functi on before obtaining equivalent scores (pre-smoothing) or it can be applied to the equipercenti le equivalent curve (post-smoothing).

Equati ng usually results in non-integer scores; these are generally rounded to integer values.

EQUIPERCENTILE EQUATING

Page 60: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Score on Form X

% of students

Cumulative % of Students

Score on Form Y

% of students

Cumulative % of Students

0 0.1 0.1 0 1.0 1.01 0.2 0.3 1 3.5 4.52 1.2 1.5 2 4.4 8.93 2.1 3.5 3 5.1 14.04 2.7 6.3 4 5.9 19.95 4.6 10.9 5 7.8 27.76 5.5 16.4 6 8.1 35.87 6.8 23.2 7 9.3 45.18 9.0 32.2 8 8.8 53.99 10.8 43.0 9 9.0 62.8

10 9.5 52.5 10 7.8 70.611 9.7 62.2 11 6.9 77.512 7.8 69.9 12 5.1 82.513 7.0 76.9 13 4.1 86.614 6.8 83.7 14 3.1 89.715 5.4 89.1 15 2.9 92.516 3.2 92.4 16 2.2 94.717 3.1 95.5 17 2.2 96.918 2.7 98.2 18 1.4 98.319 1.4 99.6 19 0.9 99.320 0.4 100.0 20 0.7 100

EQUIPERCENTILE EQUATING EXAMPLE

A score of 9 on Form Y has (almost) the same percentile rank as a score of 11 on Form X; a score of 9 on Form Y is equivalent to a score of 11 on Form X

Page 61: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

EQUIPERCENTILE EQUATING EXAMPLE

0 3 6 9 12 15 180

1

2

3

4

5

6

7

8

9

10

Test Score

% o

f Stu

dent

s

Form X

0 3 6 9 12 15 180

1

2

3

4

5

6

7

8

9

10

Test Score%

of S

tude

nts

Form Y

Mean = 10.9; SD = 3.5 Mean = 8.5; SD = 4.5

Form Y is harder than Form X

Page 62: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

EQUIPERCENTILE EQUATING EXAMPLE

Form Y is harder than Form X

0 2 4 6 8 10 12 14 16 18 200

1

2

3

4

5

6

7

8

9

10

11

Form XForm Y

Test Score

% o

f Stu

dent

s

Page 63: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

EQUIPERCENTILE EQUATING EXAMPLE

Cumulative test score distributions:

Note that the curves are not perfectly smooth

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

10

20

30

40

50

60

70

80

90

100

Test Score

Cum

ulati

ve %

of S

tude

nts

Form X

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

10

20

30

40

50

60

70

80

90

100

Test Score

Cum

ulati

ve %

of S

tude

nts

Form Y

Page 64: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

EQUIPERCENTILE EQUATING EXAMPLE

Smooth the curves and plot the smoothed distributions together:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

10

20

30

40

50

60

70

80

90

100

Form XForm Y

Test Score

Perc

entil

e Ra

nk

Page 65: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Score on Form X

Percentile rank after smoothing

Score on Form Y

Percentile rank after smoothing

0 0.0 0 0.01 0.0 1 1.52 0.1 2 4.73 0.6 3 10.04 2.0 4 17.15 4.8 5 25.06 9.2 6 33.37 15.4 7 41.88 23.2 8 50.19 32.1 9 57.9

10 41.9 10 64.911 51.9 11 71.412 61.5 12 77.313 70.4 13 82.314 78.5 14 86.615 85.5 15 90.216 91.4 16 93.217 95.9 17 95.518 98.8 18 97.319 99.9 19 98.720 100.0 20 99.6

EQUIPERCENTILE EQUATING EXAMPLE

What score on Form X has the same percentile rank as a score of 10 on Form Y?

Page 66: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Score on Form X

Percentile rank after smoothing

Score on Form Y

Percentile rank after smoothing

8 23.2 8 50.19 32.1 9 57.9

10 41.9 10 64.911 51.9 11 71.412 61.5 12 77.313 70.4 13 82.314 78.5 14 86.6

EQUIPERCENTILE EQUATING EXAMPLE

We interpolate between the scores of 12 and 13 to find the score on Form X that has a percentile rank of 68.2

Equated score = 12 + (64.9 - 61.5)/(70.4 - 61.5) = 12.38

We round this to a score of 12 for reporting purposes

Page 67: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Score on Form Y

Equated Score on Form X

Rounded Equated Score

0 -- 21 3.17 32 4.49 43 5.62 64 6.59 75 7.50 86 8.41 87 9.31 98 10.20 109 11.12 11

10 12.09 1211 13.04 1312 13.91 1413 14.72 1514 15.45 1515 16.13 1616 16.86 1717 17.35 1718 17.72 1819 19.05 1920 19.90 20

EQUIPERCENTILE EQUATING EXAMPLE

A score of 3 on Form Y is equivalent to a score of 6 on Form X

A score of 14 on Form Y is equivalent to a score of 15 on Form X

Page 68: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Score on Form Y

Equated Score on Form X

Rounded Equated Score

0 --- 21 3.17 32 4.49 43 5.62 64 6.59 75 7.50 86 8.41 87 9.31 98 10.20 109 11.12 11

10 12.09 1211 13.04 1312 13.91 1413 14.72 1514 15.45 1515 16.13 1616 16.86 1717 17.35 1718 17.72 1819 19.05 1920 19.90 20

EQUIPERCENTILE EQUATING EXAMPLE

Note that as in linear equating, we have added more points to scores at the low end of the scale than at the top

Page 69: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

The nonlinearity in the equati ng relati onship arises because we are trying to match the shape of the distributi on of Form Y to the shape of the distributi on of Form X, so we have to stretch the scale at some parts and compress it at others.

EQUIPERCENTILE EQUATING

Page 70: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Problems with equipercenti le equati ng:

Smoothing introduces subjectivity; different smoothing procedures may produce different equating results.

Equivalents can only be produced in the observed score range for the samples used in the equating; extrapolation is required for scores outside the observed range.

EQUIPERCENTILE EQUATING

Page 71: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Linear equati ng assumes that the diff erence in diffi culty between the forms is constant across the scale and that the same adjustment formula can be used everywhere.

If this assumpti on is met, linear and equipercenti le methods will produce the same equati ng results.

EQUIPERCENTILE VS LINEAR EQUATING

Page 72: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Classical equati ng approaches with NEAT designs require the constructi on of a “syntheti c” populati on that took both forms.

The anchor test is used to impute the distributi on of scores for each group on the form they did not take, so that score distributi ons are obtained for both groups on both forms.

CLASSICAL EQUATING WITH NEAT DESIGNS

Page 73: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

The syntheti c populati on mean is a weighted combinati on of the two populati ons.

Equipercenti le or linear equati ng procedures are then performed using the score distributi ons for the syntheti c populati on.

CLASSICAL EQUATING WITH NEAT DESIGNS

Page 74: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

The two most common linear equati ng approaches for NEAT designs are called the Tucker method and the Levine method.

These procedures make diff erent assumpti ons about the relati onship between the Anchor test scores and the Form scores.

We will not explore the technical details of these procedures here.

CLASSICAL EQUATING WITH NEAT DESIGNS

Page 75: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

The Levine method is generally bett er when the two groups have diff erent distributi ons.

The Tucker method is generally bett er when one form is more diffi cult than the other.

When the groups are of similar profi ciency levels and the tests are fairly similar in diffi culty, the two approaches will produce similar results.

COMPARISON OF TUCKER AND LEVINE EQUATING

Page 76: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Chained linear equati ng is another form of linear equati ng that is used with the NEAT design.

Under chained linear equati ng, scores on Form Y are linearly equated to the anchor test scores, which are in turn linearly equated to the Form X scores.

Chained linear equati ng does not require strong assumpti ons about the relati onship between the anchor test scores and the Form scores (unlike the Tucker and Levine methods).

CHAINED LINEAR EQUATING

Page 77: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Two groups take the same 20-item tests as used in previous examples, but both take an external 10-item anchor test

Group 1 takes Form X and the anchor testGroup 2 takes Form Y and the anchor test

Group 2 is of higher average ability and takes the harder test

NEAT EQUATING EXAMPLE

Page 78: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

NEAT EQUATING EXAMPLE

Mean score on Form X and Form Y are similar. Can we assume that the groups of equal ability?

Group 1:  Mean on Form X: 10.396 s.d. on Form X: 3.985 Group 2:  Mean on Form Y: 10.439 s.d. on Form Y: 4.619

Page 79: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

NEAT EQUATING EXAMPLE

The anchor test mean for Group 2 is higher, which indicates that Group 2 has higher average ability, which in turn implies that Form Y must have been harder

Group 1:  Mean on Anchor: 4.473 s.d. on Anchor: 2.181 Group 2:  Mean on Anchor: 5.417 s.d. on Anchor: 2.321

Page 80: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

NEAT EQUATING EXAMPLE

After constructing the synthetic population of test-takers who took both forms of the test, we see that the mean on Form Y is lower than the mean on Form X, i.e., Form Y is harder than Form X

Synthetic Population: Tucker Levine Mean on Form X: 11.043 11.333 s.d. on Form X: 4.110 4.243 Mean on Form Y: 9.675 9.434 s.d. on Form Y: 4.593 4.573

Page 81: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

NEAT EQUATING EXAMPLE

The equating coefficients for all methods are similar to those of the single group design

Tucker linear equating coefficients: Slope = 0.895 Intercept = 2.385  Levine linear equating coefficients: Slope = 0.928 Intercept = 2.582  Chained linear equating coefficients: Slope = 0.918 Intercept = 2.537

Page 82: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

NEAT EQUATING EXAMPLE

Score on Form Y

Rounded equated Score on Form X:

Tucker

Rounded equated Score on Form X:

Levine

Rounded equated Score on Form X:Chained Linear

0 2 3 31 3 4 32 4 4 43 5 5 54 6 6 65 7 7 76 8 8 87 9 9 98 10 10 109 10 11 11

10 11 12 1211 12 13 1312 13 14 1413 14 15 1414 15 16 1515 16 16 1616 17 17 1717 18 18 1818 18 19 1919 19 20 2020 20 20 20

Page 83: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

IRT provides a strong theoreti cal basis for equati ng.

Under IRT, the test score is replaced by an esti mate of the test-taker ’s level of the trait being measured by the test.

The trait values do not depend on the parti cular items that are administered.

The only equati ng issue in IRT is setti ng a scale for reporti ng trait values and putti ng all parameter esti mates on that common scale.

IRT PROCEDURES FOR SCORE LINKING

Page 84: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Under all IRT models, the scale for item and trait parameters is indeterminate, i.e., has no natural metric.

The issue is simply one of setti ng a scale for item and trait parameter esti mates on one form and linearly transforming esti mates from another form to the same scale.

IRT PROCEDURES FOR SCORE LINKING

Page 85: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

85

BASIC PRINCIPLES OF IRT

In IRT, we think of observed score as an (imperfect) indicator of a latent trait.

Item response models specify the relati onship between performance on a test item and the latent trait or traits measured by the test.

Specifi cally, item response models specify a mathemati cal relati onship between the probability of a given response to an item and the set of underlying traits.

Page 86: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

86

BASIC PRINCIPLES OF IRT

The probability of the response is determined by characteristi cs of the item as well as the individual’s trait value(s).

The most widely used models assume that the test is unidimensional, i.e., measures a single trait or construct.

Page 87: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

87

ITEM RESPONSE FUNCTIONS FOR DICHOTOMOUS ITEMS

(UNIDIMENSIONAL MODEL)

-3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.50.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Trait Value

Prob

abili

ty o

f Cor

rect

Res

pons

e

Page 88: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

88

ADVANTAGES OF IRT

When an IRT model holds,

1. Person parameters (trait values) are invariant, i.e., they do not depend on the set of items administered;

2. Item parameters are invariant, i.e., they do not depend on the characteristi cs of the populati on.

These properti es allow us to compare the trait esti mates of individuals who have taken diff erent forms of a test.

Page 89: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

89

IRT MODELS

There are three widely used models for dichotomous responses and three widely used models for polytomous responses.

The majority of test items used on standardized assessments are dichotomously scored multi ple choice items.

The models for dichotomous responses diff er in the number of item characteristi cs that are assumed to infl uence test-taker performance in additi on to test-taker ability or profi ciency.

Page 90: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

90

IRT MODELS

The two-parameter (2P) model assumes that responses are infl uenced by the diffi culty and discriminati on of the items.

The one-parameter (1P) model restricts the two-parameter model by assuming that item responses are infl uenced only by item diffi culty, i.e., that all items are equally discriminati ng.

The three-parameter model (3P) extends the 2P model by including a parameter that allows for a non-zero probability of a correct response even at the lowest trait values (a “guessing” parameter).

Page 91: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

91

THE 2P LOGISTIC MODEL

( )

( ) ( )

1( 1| )

1 1

j i j

j i j j i j

a b

ij i a b a b

eP u

e e

uij is the response of individual i to item j

is the latent trait value of individual i

aj is the discriminating power of item j

bj is the difficulty level of item j

i

Page 92: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

92

THE 2P LOGISTIC MODEL

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.00.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Trait value

Prob

abili

ty o

f Cor

rect

Res

pons

e

Difficulty b is the trait value at which the proba-bility of a correct response is .5

Discrimination a is proportional to slope at b

Page 93: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

93

THE 1P LOGISTIC MODEL

( )

( ) ( )

1( 1| )

1 1

i j

i j i j

b

ij i b b

eP u

e e

uij is the response of individual i to item j

is the latent trait value of individual i

bj is the difficulty level of item j

i

Page 94: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

94

THE 3P LOGISTIC MODEL

( )

( )( 1| ) (1 )1

j i j

j i j

a b

ij i j j a b

eP u c c

e

uij is the response of individual i to item j

is the latent trait value of individual i

aj is the discriminating power of item j

bj is the difficulty level of item j

cj is the “guessing “ parameter (lower asymptote) of item j

i

Page 95: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

IRT requires relati vely large samples for accurate esti mati on of the model parameters (at least 1000 for the 3P model).

Specialized computer programs are required.

The advantages of IRT hold only when the model adequately fi ts the data. It is essenti al that the fi t of the model to the data be investi gated before the results are used.

USING IRT MODELS

Page 96: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Under the 1P model, the scale for is arbitrary as long as b is expressed on the same scale:

where

We can choose whatever metric we want for , as long as we use the same metric for item diffi culty b.

IRT PROCEDURES FOR SCORE LINKING

( ) {( ) ( )}

( * *)

b K b K

b

* ( )

* ( )

K

b b K

Page 97: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Under the 2P and 3P models, the term is invariant over linear transformati ons of the scale, as long as a and b are transformed correspondingly:

where

IRT PROCEDURES FOR SCORE LINKING

( ) a b

( ) { ( ) ( )}

*( * *)

a aa b M K Mb K

M Ma b

*

*

*

aa

MM K

b Mb K

Page 98: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

The scale for parameter esti mates is set during calibrati on, either by setti ng the mean and SD of to be 0 and 1, respecti vely, or setti ng the mean and SD of the diffi culty values to be 0 and 1.

Most soft ware packages fi x the scale on .

When diff erent tests are calibrated in diff erent groups, the scales will be set diff erently.

However, esti mates for the same items or people will diff er only by a linear transformati on.

IRT PROCEDURES FOR SCORE LINKING

Page 99: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

We need to rescale the parameter esti mates from the diff erent tests and for the diff erent groups so that they are on a common scale.

Item response data must be collected using one of the data collecti on designs.

For the single group design, we can simply calibrate all items together; the esti mates will automati cally be on the same scale.

For the equivalent groups design, we assume that the groups have the same mean and SD and fi x the scale on in each group/test.

IRT PROCEDURES FOR SCORE LINKING

Page 100: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

For the NEAT design, there are two general sets of methods:

• Calibration methods

- Concurrent Calibration- Fixed Item Parameter Calibration

• Transformation methods

- Mean and sigma approach (and variati ons)- Characteristi c curve method (Stocking-Lord

method)

IRT PROCEDURES FOR SCORE LINKING

Page 101: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Calibrati on methods automati cally put all parameter esti mates on a common scale.

Transformati on methods require determinati on of the linear relati onship between parameter esti mates from the two forms.

IRT PROCEDURES FOR SCORE LINKING

Page 102: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

For the NEAT design, we can calibrate all items and all groups together, treati ng the items not taken in each group as missing (indicated by 9):

00110101001000000000 0001100010 9999999999999999999900011110101101000110 0011010011 9999999999999999999901011111101000110100 0011100011 9999999999999999999900111111001000000000 0011100000 9999999999999999999999999999999999999999 1110111101 1111111111111110111199999999999999999999 1111011011 1111101110011111111199999999999999999999 1000000010 0111000000110001000099999999999999999999 1000010010 1111101011111100000099999999999999999999 1111001111 11110111111111101111

Form X items, Form Y items, Anchor Items, Missing responses

This procedure is called concurrent calibrati on .

CONCURRENT CALIBRATION

Page 103: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Concurrent calibrati on does not work well if there are large mean diff erences in trait levels between groups, unless group membership is explicitly taken into account.

Modern computer programs allow a group membership code and separate trait distributi ons in each group.

The mean and SD of the trait values in each group are allowed to diff er and are esti mated.

CONCURRENT CALIBRATION

Page 104: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

An alternati ve approach is to calibrate the items for each group separately, but hold the item parameters for the common items in the second group fi xed at the values obtained for the common items in the base group.

This procedure is called fi xed item parameter calibrati on.

Fixed item calibrati on someti mes causes esti mati on problems, especially when the groups are very diff erent in trait distributi ons.

FIXED ITEM PARAMETER CALIBRATION

Page 105: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

With this approach, we calibrate items separately in each group and determine the linear transformati on needed to rescale one group to the scale of the other by comparing the parameter esti mates on the anchor items.

Concurrent calibrati on is theoreti cally preferred in general; however, the sparseness of the response matrix in NEAT designs may cause esti mati on problems in some cases.

TRANSFORMATION METHODS

Page 106: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Calculate the mean and SD of the diffi culty parameters for the anchor items in the two groups.

Determine the linear transformati on that would transform the mean and SD of anchor item diffi culti es in Group 1 (Form X) to equal those of Group 2 (Form Y).

We know that

MEAN AND SIGMA METHOD

Y Xb Mb K

1Y Xa a

M

Page 107: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Since ,

and

Therefore,

and

MEAN AND SIGMA METHOD

Y Xb Mb K

Y Xb M b K

Y Xb bs Ms

Y

X

b

b

sM

s Y XK b M b

Page 108: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

This transformati on can be inverted to place parameters from Form Y/Group2 on the scale of Form X/Group 1 (Symmetry requirement):

Once the equati ng coeffi cients M and K are determined, item and trait parameter esti mates for the unique (non-anchor) items of Form Y are transformed to the scale of Form X.

MEAN AND SIGMA METHOD

X

Y

bY X

b

sM

s X YY X Y XK b M b

Page 109: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Parameter esti mates for the anchor items are averaged.

For the 1P model, only the additi ve constant K is calculated.

The trait esti mates for Group 2 are transformed using the same transformati on as was used for the item diffi culty parameters

MEAN AND SIGMA METHOD

Page 110: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

MEAN AND SIGMA METHOD EXAMPLE

Form X Form Y Unique Items Common Items Common Items Unique Items

Item a b c a b c a b c a b c1 1.08 0.59 0.212 0.63 1.22 0.123 0.92 -0.48 0.184 1.37 0.69 0.225 0.57 -1.11 0.096 0.94 -0.53 0.067 0.71 0.66 0.15 0.63 0.48 0.198 1.31 -0.89 0.22 1.08 -0.99 0.159 0.92 -0.05 0.16 0.78 -0.36 0.26

10 1.25 0.41 0.09 1.18 0.24 0.1311 1.34 -0.29 0.1112 0.48 -1.04 0.2313 0.76 0.45 0.1914 0.69 1.21 0.1315 1.09 -0.57 0.0816 0.49 -0.23 0.17

Mean b: .0325SD b: .682

Mean b: -.1575SD b: .658

Page 111: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

Calculate equati ng constants:

Transform Form Y parameters:

MEAN AND SIGMA METHOD EXAMPLE

.658.965

.682 M

.0325 .965 ( .1575) .185 K

* .965 .185 Y Xb b

Page 112: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

MEAN AND SIGMA METHOD EXAMPLE

Form X Form Y Unique Items Common Items Common Items Unique Items

Item a b c a b c a b c a b c1 1.08 0.59 0.212 0.63 1.22 0.123 0.92 -0.48 0.184 1.37 0.69 0.225 0.57 -1.11 0.096 0.94 -0.53 0.067 0.71 0.66 0.15 0.74 0.65 0.198 1.31 -0.89 0.22 1.36 -0.77 0.159 0.92 -0.05 0.16 0.95 -0.16 0.26

10 1.25 0.41 0.09 1.30 0.42 0.1311 1.39 -0.09 0.1112 0.50 -0.82 0.2313 0.79 0.62 0.1914 0.72 1.35 0.1315 1.13 -0.37 0.0816 0.51 -0.04 0.17

After transformation of Form Y parameters:

Page 113: 1 LINKING AND EQUATING OF TEST SCORES Hariharan Swaminathan University of Connecticut.

This method uses the informati on in both the diffi culty and discriminati on parameter esti mates to determine the equati ng relati onship.

The linear transformati on is found that would minimize the sum of squared diff erences between expected scores on the common items based on the item parameter esti mates for each group .

This method is considerably more complex than the mean and sigma method.

CHARACTERISTIC CURVE METHOD


Recommended