CHARACTERISTICS OF A GOOD TEST Ann Meredith U. Garcia, MD
Reliability vs. validity
¤ A degree of test reliability is requisite to validity.
VALID ≠ RELIABLE
TEST RELIABILITY
Definition
¤ Consistency with which a test measures what it is measuring ¤ Consistent, constant, and repeatable results?
¤ Over time? Across different versions of a test? Among scale items?
TEST RELIABILITY
Definition
¤ Consistency with which a test measures what it is measuring ¤ Consistent, constant, and repeatable results?
¤ Goal: As close as possible to measuring the TRUE SCORE
TEST RELIABILITY
Sources of error
TEST RELIABILITY
is a HUMAN BEING
Examinee
Sources of error
TEST RELIABILITY
is a HUMAN BEING
Examinee
Examiner
Sources of error
TEST RELIABILITY
is designed by & for
HUMAN BEINGS
Examinee
Examiner
Examination
Sources of measurement error: 1. OBJECTIVITY OF SCORING
¤ Different scorers produce the same score if they apply the same scoring key
¤ More objective scoring à more accurate score
TEST RELIABILITY
Score1? Score2? Score3?
Sources of measurement error: 2. SAMPLING OF CONTENT
¤ A teacher cannot really construct 2 forms of a test that are independent of each other.
¤ Another teacher’s test usually would differ even more.
TEST RELIABILITY
Sources of measurement error: 2. SAMPLING OF CONTENT
¤ If the test plan is fairly detailed and followed carefully à content sampling for an objective test with a large number of items should be reasonably adequate
TEST RELIABILITY
Sources of measurement error: 3. TEMPORAL INFLUENCES
¤ TEMPORAL STABILITY – scores should fluctuate very little over a reasonably brief time interval
TEST RELIABILITY
TEST A Score?
TEST A Score?
Methods of estimating reliability: 1. TEST-RETEST METHOD
¤ Estimates TEMPORAL RELIABILITY – correlation between scores on the 2 trials
¤ COEFFICIENT OF STABILITY – measure of the correspondence of scores obtained at 2 different times
TEST RELIABILITY
TEST A Score?
TEST A Score?
Methods of estimating reliability: 1. TEST-RETEST METHOD
¤ Assesses the external consistency of a test
¤ NO information about possible effects of inadequate sampling of contents and processes
TEST RELIABILITY
TEST A Score?
TEST A Score?
Methods of estimating reliability: 2. ALTERNATE-FORMS METHOD
¤ COEFFICIENT OF STABILITY AND EQUIVALENCE – correlation of scores on the 2 forms would reveal not only temporal influences (delayed testing) but also content differences (immediate & delayed testing)
TEST AX Score?
TEST AY Score?
TEST RELIABILITY
Methods of estimating reliability: 3. INTER-RATER RELIABILITY
¤ Different and equally competent raters evaluate the results of a single test à correlate the 2 sets of scores
¤ Assesses the consistency of how a measuring system is implemented
TEST RELIABILITY
Score1? Score2? AVERAGE
¤ Also called ODD-EVEN RELIABILITY
¤ r = estimate of content reliability for half of the test
¤ R = estimate of content reliability for the whole test
Methods of estimating reliability: 4. SPLIT-HALF METHOD
TEST RELIABILITY
TEST Aodd
Score?
TEST Aeven
Score? r
Methods of estimating reliability: 4. SPLIT-HALF METHOD
TEST RELIABILITY
¤ Extension of the split-half method performed on all combinations of questions à average of split-half estimates that would be expected from making all possible divisions of a test into halves
¤ Measure of internal consistency reliability for measures with dichotomous choices
Methods of estimating reliability: 5. KUDER-RICHARDSON APPROACH
TEST RELIABILITY
TEST Aodd
Score?
TEST Aeven
Score? r
k = number of questions
pj = number of people in the sample who answered question j correctly
qj = number of people in the sample who didn’t answer question j correctly
σ2 = variance of the total scores of all the people taking the test
Methods of estimating reliability: 5. KUDER-RICHARDSON APPROACH
TEST RELIABILITY
TEST Aodd
Score?
TEST Aeven
Score? r
Advantages & disadvantages
TEST RELIABILITY
Which method should be used?
• Stability of test scores over time
• Consistency of scores over different test forms
• Go-togetherness of test items
TEST RELIABILITY
Factors affecting reliability: 1. LENGTH OF TEST
TEST RELIABILITY
¤ Larger sampling of responses with equally good items or greater length of test à higher reliability ¤ Reliability does NOT increase in a straight line (SPEARMAN-
BROWN FORMULA)
¤ Reliability of .50 increases to .67 when the length of a test is doubled
¤ Assumption: Subjects do not become exhausted and lose motivation
Factors affecting reliability: 2. RANGE OF TALENT
TEST RELIABILITY
¤ Validity and reliability coefficients can be expected to increase as range of talent of the subjects increases ¤ Homogeneous group à lower reliability coefficient
¤ Wider spread of scores à higher reliability
¤ Sample of subjects should be representative of those for whom one wishes to draw conclusions about individual differences
Factors affecting reliability: 3. TIME LIMITS
TEST RELIABILITY
¤ SPLIT-HALF and KUDER-RICHARDSON approaches ¤ If some students do not have time to try some items à
¤ Proportion of correct responses for those items will decrease and the score spread will increase à
¤ Positive although spurious influence on the size of the reliability coefficient
Factors affecting reliability: 4. DIFFICULTY OF TEST ITEMS
TEST RELIABILITY
¤ Narrow score distributions à low reliability
VERY DIFFICULT
TEST
VERY EASY TEST
Other factors affecting reliability
TEST RELIABILITY
Best reliability
TEST RELIABILITY
Definition
¤ Usefulness or applicability of the testing procedure in order to serve the needs of its users
PRACTICALITY
Economy of: þ Time þ Effort þ Money
1. Ease of CONSTRUCTION ¤ Demands adequate time and informed talent
PRACTICALITY
2. Ease of ADMINISTRATION ¤ Clarity and simplicity
¤ Ease of reading instructions
3. Ease of SCORING ¤ Subjective vs. objective?
4. Ease of INTERPRETATION and APPLICATION
¤ Meaningfulness of scores obtained from the test ¤ Misinterpreted or misapplied test results – of little value and
may be harmful to certain individuals or groups
PRACTICALITY
Definition
¤ RELIABILITY and VALIDITY – often discussed separately but sometimes you will see them both referred to as aspects of generalizability
¤ Extent one can generalize the results of a measure or a test used with a particular group to other tests or other groups
GENERALIZABILITY
Thank you! J