Instrument Validity
&Reliability
Why do we use instruments?
• Reliance upon our senses for empirical evidence
• Senses are unreliable
• Senses are imprecise – not valid enough – Operational definitions are important
Validity –
How much confidence do you have in the measurement of your instrument?
Reliability –
How consistent is your measurement?
How much confidence do you have?
Judgmental Validity
– Face V
– Content V
Empirical validity– Criterion-related V
• Predictive • Concurrent
Judgmental-Empirical
- Construct V
Face Validity
• Does instrument look like valid?– On a survey or questionnaire, the questions
seem to be relevant– On a checklist, the behaviors seem relevant– For a performance test, the task seems to be
appropriate
Content Validity
• The content of the test, the measure, is relevant to the behavior or construct being measured
• An expert judges or a panel of experts judge the content
Criterion Related Validity
• Using a another independent measure to validate a test– Typically computing a correlation – validity
coefficient
• Two types– Predictive validity– Concurrent validity
Criterion-Related Validity
Predictive• ACT achievement test
Correlated with College GPA
Concurrent• Coopersmith Self-esteem Scale
Correlated with teacher’s ratings of self-esteem
Construct Validity
• Construct – attempt to describe, name an intangible variable
• Use many different measures to validate a measure
• Self-esteem – construct– Instrument measure
Construct Validity
• Self-esteem – construct– Instrument measure e.g. coopersmith
– Correlated it with:• Behavioral checklist• Teacher’s comments• Another accepted instrument for Self-esteem• A measure of confidence• Locus of control measure
Reliable but is it Valid?Valid but is it Reliable?
Invalid and UnreliableNo confidence you’ll get near the target; have no idea where it’s going to shoot.
Reliable but is it Valid?Valid but is it Reliable?
Invalid but ReliableNo confidence you’ll get near the target; but you know where it’s going to shoot (just not at the target!)
Reliable but is it Valid?Valid but is it Reliable?
Valid but UnreliableConfidence that when you hit something, it’s what you want, but you can’t depend upon consistency.
Reliable but is it Valid?Valid but is it Reliable?
Valid and ReliableConfident that when you hit a target, it’s what you want and you can depend upon consistent shots.
Reliability
• For an instrument –– Consistency of scores from use to use
• Types of reliability coefficients– Test – retest– Equivalent forms– Internal consistency
• Split-half• Alpha coefficient (Cronbach alpha)
Reliability Coefficient
• Value ranges from 0 to +1.00
• .70 considered the minimal acceptable
• .90 is very good
• .60 is sometimes acceptable but is really not very good
• Lower than .60 definitely unacceptable
Inter-rater reliability
Example –
Two teachers reading same essay, scoring them in a similar manner – consistently
Using same checklist to make observations
Can be expressed as a coefficient
Often as percentage of agreement
A function of training, objectivity, and rubric or checklist, i.e., the operational definition!
• Norm-referenced tests– Comparison of individual score to others– Intelligence test– ISAT, Iowa Basic Skills Test– SAT aptitude test– Personality test
– Percentile’s - derived scores– Grading on a curve
• Criterion referenced test– Individual score is compare to a benchmark (a
criterion)
– If Raw Score used (no conversion): C-R test– Mastery of material– Earning a grade in my class
– Disadvantage is potential lack of variability
Measures of Optimum Performance
• Aptitude Tests– Predict future performance
• Achievement tests– Measure current knowledge
• Performance tests– Measure current ability to complete tasks
Measures of typical performance
Often impacted by “social desirability”– Wanting to hide
undesirable traits or characteristics
One way to work around sd is touse projective tests
Rorschach ink Blot
Thematic Apperception Test
• Paper/pencil measures of attitudes using Likert-type scales
• Strongly Agree – Strongly Disagree- Reverse scoring to prevent or identify
“response bias”