SOCW 671: #5
Measurement Levels, Reliability, Validity, & Classic Measurement Theory
Want to measure variables
Variables are persons, places or things
A conceptual entity, any construct or characteristic to which different numerical values can be assigned for purposes of analysis or comparison
Variables
Independent, Dependent and Control variables
Measurement is the process of assigning numbers (or things that take the place of numbers) to variables according to a set of rules
Measurement
It’s a process because you want to measure change (variables).
It’s a process, not an event. Measurement deals with variables
that change.
Measurement Scales Set of rules proposed by S. S. Sterns in
1946 in the journal Science. He proposed a four-tiered hierarchy of
scales, from most simple to most complex Nominal Ordinal Interval Ratio
Nominal (or Categorical)
The process of grouping individual observations into qualitative categories or classes
Does not involve magnitude
Examples: gender, religion, & ethnicity
Ordinal
A measuring procedure which assigns one object a greater number, the same number, or a smaller number than a second object only if the first possesses, respectively, more, the same, or less of the characteristic being measured than the second object
For example: Likert scales, which rate items (from strongly, disagree to strongly agree)
Interval
A special kind of ordinal scaling where the measurement assigned to an object is linearly related to its true magnitude
Has an arbitrary origin (zero-point) and a fixed, though arbitrary, unit of measure
Has set intervals (i.e. time)
Ratio
A special kind of interval scaling where the measurement assigned to an object is proportional to its true magnitude
Has an absolute zero (i.e. weight)
To measure variables
First you need to figure out how you will measure
Just because variables may have numeric values does not necessarily make them interval or ratio (e.g. Likert Scales)
Reliability & Validity
Involves Classical Measurement Theory
O = T + E (observed = true score plus error)
Benefit of classical Measurement Theory is that it solves for E
Reliability
Instrument Reliability- consistency with which you measure whatever you intend to measure
Consistency of scores. Ex. if using a scale to weigh yourself, if use several times and obtain similar weights, it’s reliable
Three paradigms: internal consistency, test/retest, and alternate/parallel forms
Measures of Internal Consistency (Reliability) Split halves: split test in half and correlate
the two halves Odd/even: is method for solving for the
problems of split-halves Kuder Richardson 20: estimates the
correlation of all permutations KR-21: simplified K-R 20 Cornbach alpha: can be used with the
widest variety of data collection procedures
Test/Retest
No intervention, one test, then same test later. (purpose is to test the instrument, not achievement)
Problems include: memory and practice effects
1 – 3 week delay between tests is the best because no fatigue and low memory and practice effects
Alternate/Parallel Forms
Alternate: same test items, but in a different sequence
Parallel: write two items from blueprint. Use one item for one test and the other item for the other test (i.e. Columbus in 1492 discovered ___. America is 1492 was discovered by ___.
Parallel reduces memory effects. Alternate reduces practice effects
Standard Error of Measure (standard deviation of error) SEM indicates the range within which the “true”
score of the individual is likely to fall, while taking into consideration the unreliability of the test
E.g. If a student received a score (observed) of 85 on a test, and the standard error of measure (SEM) is 4.0, then the true score would probable range somewhere between 81 and 89
SEM
SEM: standard deviation divided by the square root of one minus the reliability coefficient
As range increases, interpretability goes down. As confidence range increases, interpretability decreases
The more variability the less useful it is
z & t-Scores
z = raw score minus mean Standard deviation
t = 50 + 10(z)
Used to compare individual scores to the population who took test
Instrument Validity
Degree to which a test measures what it purports to measure.
Reliability is prerequisite to validity, to be valid, a test must first be reliable
Past texts had validity before reliability because it occurred first, however reliability is primary to validity
Tests themselves are not valid, it’s their application that is or is not
Four types of validity: content, concurrent, predictive, & construct
Content Validity
Degree to which the content on a test matches the content in the blueprint (or course)
Can use curriculum guides, other teachers, blueprints, principal, professional standards
Deals with the question of whether a given data collection technique adequately measures the whole range of topics it is suppose to measure
Concurrent Validity
A type of measurement validity that deals with the question of whether a given data collection technique correlates highly with another data collection technique that is suppose to measure the same thing
The degree to which the scores on a test are related to the scores on another, already established test administered at the same time, or to some other valid criterion available at the same time
Predictive Validity (aka: Criterion Validity)
Degree to which a test is able to predict how well an individual will do in a future situation
A type of measurement validity that deals with the question of whether a measurement process forecasts a person’s performance on a future task
Construct Validity A fiction or invention used to explain
reality (i.e. math anxiety) A type of measurement validity that
deals with the question of whether a given data collection technique is actually providing an assessment of an abstract, theoretical psychological characteristic.
Construct Validity (continued)
The degree to which a test measures an intended hypothetical construct, or non-observable trait, which explains behavior
Factor analysis is the statistical technique used to measure construct validity