Download - Characteristics of a Good Test

CHARACTERISTICS OF A GOOD TEST Ann Meredith U. Garcia, MD

Reliability vs. validity

¤  A degree of test reliability is requisite to validity.

VALID ≠ RELIABLE

TEST RELIABILITY

Definition

¤  Consistency with which a test measures what it is measuring ¤  Consistent, constant, and repeatable results?

¤  Over time? Across different versions of a test? Among scale items?

TEST RELIABILITY

Definition

¤  Consistency with which a test measures what it is measuring ¤  Consistent, constant, and repeatable results?

¤  Goal: As close as possible to measuring the TRUE SCORE

TEST RELIABILITY

Sources of error

TEST RELIABILITY

is a HUMAN BEING

Examinee

Sources of error

TEST RELIABILITY

is a HUMAN BEING

Examinee

Examiner

Sources of error

TEST RELIABILITY

is designed by & for

HUMAN BEINGS

Examinee

Examiner

Examination

Sources of measurement error: 1. OBJECTIVITY OF SCORING

¤  Different scorers produce the same score if they apply the same scoring key

¤  More objective scoring à more accurate score

TEST RELIABILITY

Score1? Score2? Score3?

Sources of measurement error: 2. SAMPLING OF CONTENT

¤  A teacher cannot really construct 2 forms of a test that are independent of each other.

¤  Another teacher’s test usually would differ even more.

TEST RELIABILITY

Sources of measurement error: 2. SAMPLING OF CONTENT

¤  If the test plan is fairly detailed and followed carefully à content sampling for an objective test with a large number of items should be reasonably adequate

TEST RELIABILITY

Sources of measurement error: 3. TEMPORAL INFLUENCES

¤  TEMPORAL STABILITY – scores should fluctuate very little over a reasonably brief time interval

TEST RELIABILITY

TEST A Score?

TEST A Score?

Methods of estimating reliability: 1. TEST-RETEST METHOD

¤  Estimates TEMPORAL RELIABILITY – correlation between scores on the 2 trials

¤  COEFFICIENT OF STABILITY – measure of the correspondence of scores obtained at 2 different times

TEST RELIABILITY

TEST A Score?

TEST A Score?

Methods of estimating reliability: 1. TEST-RETEST METHOD

¤  Assesses the external consistency of a test

¤  NO information about possible effects of inadequate sampling of contents and processes

TEST RELIABILITY

TEST A Score?

TEST A Score?

Methods of estimating reliability: 2. ALTERNATE-FORMS METHOD

¤  COEFFICIENT OF STABILITY AND EQUIVALENCE – correlation of scores on the 2 forms would reveal not only temporal influences (delayed testing) but also content differences (immediate & delayed testing)

TEST AX Score?

TEST AY Score?

TEST RELIABILITY

Methods of estimating reliability: 3. INTER-RATER RELIABILITY

¤  Different and equally competent raters evaluate the results of a single test à correlate the 2 sets of scores

¤  Assesses the consistency of how a measuring system is implemented

TEST RELIABILITY

Score1? Score2? AVERAGE

¤  Also called ODD-EVEN RELIABILITY

¤  r = estimate of content reliability for half of the test

¤  R = estimate of content reliability for the whole test

Methods of estimating reliability: 4. SPLIT-HALF METHOD

TEST RELIABILITY

TEST Aodd

Score?

TEST Aeven

Score? r

Methods of estimating reliability: 4. SPLIT-HALF METHOD

TEST RELIABILITY

¤  Extension of the split-half method performed on all combinations of questions à average of split-half estimates that would be expected from making all possible divisions of a test into halves

¤  Measure of internal consistency reliability for measures with dichotomous choices

Methods of estimating reliability: 5. KUDER-RICHARDSON APPROACH

TEST RELIABILITY

TEST Aodd

Score?

TEST Aeven

Score? r

k = number of questions

pj = number of people in the sample who answered question j correctly

qj = number of people in the sample who didn’t answer question j correctly

σ2 = variance of the total scores of all the people taking the test

Methods of estimating reliability: 5. KUDER-RICHARDSON APPROACH

TEST RELIABILITY

TEST Aodd

Score?

TEST Aeven

Score? r

Advantages & disadvantages

TEST RELIABILITY

Which method should be used?

• Stability of test scores over time

• Consistency of scores over different test forms

• Go-togetherness of test items

TEST RELIABILITY

Factors affecting reliability: 1. LENGTH OF TEST

TEST RELIABILITY

¤  Larger sampling of responses with equally good items or greater length of test à higher reliability ¤  Reliability does NOT increase in a straight line (SPEARMAN-

BROWN FORMULA)

¤  Reliability of .50 increases to .67 when the length of a test is doubled

¤  Assumption: Subjects do not become exhausted and lose motivation

Factors affecting reliability: 2. RANGE OF TALENT

TEST RELIABILITY

¤  Validity and reliability coefficients can be expected to increase as range of talent of the subjects increases ¤  Homogeneous group à lower reliability coefficient

¤  Wider spread of scores à higher reliability

¤  Sample of subjects should be representative of those for whom one wishes to draw conclusions about individual differences

Factors affecting reliability: 3. TIME LIMITS

TEST RELIABILITY

¤  SPLIT-HALF and KUDER-RICHARDSON approaches ¤  If some students do not have time to try some items à

¤  Proportion of correct responses for those items will decrease and the score spread will increase à

¤  Positive although spurious influence on the size of the reliability coefficient

Factors affecting reliability: 4. DIFFICULTY OF TEST ITEMS

TEST RELIABILITY

¤  Narrow score distributions à low reliability

VERY DIFFICULT

TEST

VERY EASY TEST

Other factors affecting reliability

TEST RELIABILITY

Best reliability

TEST RELIABILITY

Definition

¤  Usefulness or applicability of the testing procedure in order to serve the needs of its users

PRACTICALITY

Economy of: þ Time þ Effort þ Money

1. Ease of CONSTRUCTION ¤  Demands adequate time and informed talent

PRACTICALITY

2. Ease of ADMINISTRATION ¤  Clarity and simplicity

¤  Ease of reading instructions

3. Ease of SCORING ¤  Subjective vs. objective?

4. Ease of INTERPRETATION and APPLICATION

¤  Meaningfulness of scores obtained from the test ¤  Misinterpreted or misapplied test results – of little value and

may be harmful to certain individuals or groups

PRACTICALITY

Definition

¤  RELIABILITY and VALIDITY – often discussed separately but sometimes you will see them both referred to as aspects of generalizability

¤  Extent one can generalize the results of a measure or a test used with a particular group to other tests or other groups

GENERALIZABILITY

Thank you! J