Download - SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.

SOCW 671: #5

Measurement Levels, Reliability, Validity, & Classic Measurement Theory

Want to measure variables

Variables are persons, places or things

A conceptual entity, any construct or characteristic to which different numerical values can be assigned for purposes of analysis or comparison

Variables

Independent, Dependent and Control variables

Measurement is the process of assigning numbers (or things that take the place of numbers) to variables according to a set of rules

Measurement

It’s a process because you want to measure change (variables).

It’s a process, not an event. Measurement deals with variables

that change.

Measurement Scales Set of rules proposed by S. S. Sterns in

1946 in the journal Science. He proposed a four-tiered hierarchy of

scales, from most simple to most complex Nominal Ordinal Interval Ratio

Nominal (or Categorical)

The process of grouping individual observations into qualitative categories or classes

Does not involve magnitude

Examples: gender, religion, & ethnicity

Ordinal

A measuring procedure which assigns one object a greater number, the same number, or a smaller number than a second object only if the first possesses, respectively, more, the same, or less of the characteristic being measured than the second object

For example: Likert scales, which rate items (from strongly, disagree to strongly agree)

Interval

A special kind of ordinal scaling where the measurement assigned to an object is linearly related to its true magnitude

Has an arbitrary origin (zero-point) and a fixed, though arbitrary, unit of measure

Has set intervals (i.e. time)

Ratio

A special kind of interval scaling where the measurement assigned to an object is proportional to its true magnitude

Has an absolute zero (i.e. weight)

To measure variables

First you need to figure out how you will measure

Just because variables may have numeric values does not necessarily make them interval or ratio (e.g. Likert Scales)

Reliability & Validity

Involves Classical Measurement Theory

O = T + E (observed = true score plus error)

Benefit of classical Measurement Theory is that it solves for E

Reliability

Instrument Reliability- consistency with which you measure whatever you intend to measure

Consistency of scores. Ex. if using a scale to weigh yourself, if use several times and obtain similar weights, it’s reliable

Three paradigms: internal consistency, test/retest, and alternate/parallel forms

Measures of Internal Consistency (Reliability) Split halves: split test in half and correlate

the two halves Odd/even: is method for solving for the

problems of split-halves Kuder Richardson 20: estimates the

correlation of all permutations KR-21: simplified K-R 20 Cornbach alpha: can be used with the

widest variety of data collection procedures

Test/Retest

No intervention, one test, then same test later. (purpose is to test the instrument, not achievement)

Problems include: memory and practice effects

1 – 3 week delay between tests is the best because no fatigue and low memory and practice effects

Alternate/Parallel Forms

Alternate: same test items, but in a different sequence

Parallel: write two items from blueprint. Use one item for one test and the other item for the other test (i.e. Columbus in 1492 discovered ___. America is 1492 was discovered by ___.

Parallel reduces memory effects. Alternate reduces practice effects

Standard Error of Measure (standard deviation of error) SEM indicates the range within which the “true”

score of the individual is likely to fall, while taking into consideration the unreliability of the test

E.g. If a student received a score (observed) of 85 on a test, and the standard error of measure (SEM) is 4.0, then the true score would probable range somewhere between 81 and 89

SEM

SEM: standard deviation divided by the square root of one minus the reliability coefficient

As range increases, interpretability goes down. As confidence range increases, interpretability decreases

The more variability the less useful it is

z & t-Scores

z = raw score minus mean Standard deviation

t = 50 + 10(z)

Used to compare individual scores to the population who took test

Instrument Validity

Degree to which a test measures what it purports to measure.

Reliability is prerequisite to validity, to be valid, a test must first be reliable

Past texts had validity before reliability because it occurred first, however reliability is primary to validity

Tests themselves are not valid, it’s their application that is or is not

Four types of validity: content, concurrent, predictive, & construct

Content Validity

Degree to which the content on a test matches the content in the blueprint (or course)

Can use curriculum guides, other teachers, blueprints, principal, professional standards

Deals with the question of whether a given data collection technique adequately measures the whole range of topics it is suppose to measure

Concurrent Validity

A type of measurement validity that deals with the question of whether a given data collection technique correlates highly with another data collection technique that is suppose to measure the same thing

The degree to which the scores on a test are related to the scores on another, already established test administered at the same time, or to some other valid criterion available at the same time

Predictive Validity (aka: Criterion Validity)

Degree to which a test is able to predict how well an individual will do in a future situation

A type of measurement validity that deals with the question of whether a measurement process forecasts a person’s performance on a future task

Construct Validity A fiction or invention used to explain

reality (i.e. math anxiety) A type of measurement validity that

deals with the question of whether a given data collection technique is actually providing an assessment of an abstract, theoretical psychological characteristic.

Construct Validity (continued)

The degree to which a test measures an intended hypothetical construct, or non-observable trait, which explains behavior

Factor analysis is the statistical technique used to measure construct validity