Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | miles-stafford |
View: | 221 times |
Download: | 0 times |
Validity
Psych 395 - DeShon
Example: Validity of a Measure
“The use of the polygraph (lie detector test) is not nearly as valid as some say and can easily be beaten and should never be admitted into evidence in courts of law, say psychologists from two scientific communities who were surveyed on the validity of polygraphs.” – APA News Release
Measures and Constructs, Again!
X YUnobservableVariables
X YMeasuredVariables
Basic Insights…
Differences in Observed Measures are Caused by Variations in the Unobserved Construct.
One Way to Think About Validity: How well does the observed variable capture the unobserved variable?
Apply this Idea to the Polygraph…
Example: Validity of a Measure
“The use of the polygraph (lie detector test) is not nearly as valid as some say and can easily be beaten and should never be admitted into evidence in courts of law, say psychologists from two scientific communities who were surveyed on the validity of polygraphs.” – APA News Release
Issues of Validity
Does the test actually measure what it is purported to measure?
Do differences in tests scores reflect true differences in the underlying construct?
Are inferences based on the test scores justified?
It’s All About Inferences….
Cronbach (1971): Validation is the process of collecting evidence to support the types of inferences that are drawn from test scores.
There is no such thing as “the” validity of a test. Why? Many different kinds of inferences can be made from the same test.
“Validity for what?” Inferences and decisions based on test scores
A person with this score is likely to Be a better parent Do well in law school Be most satisfied as an engineer Steal from his/her employer
Types of validity
Content Criterion-related Construct
construct(general evidence-gathering)
content(more theory-based)
criterion-related(more data-based)
Content Validity of a Measure
Collectively, do the items adequately represent all of the domains of the construct of interest?
Staring Point: A Well Defined Construct. Often have a panel of experts judge whether
items adequately sample the domain of interest.
Example: 1st Grade Math Objectives
What 1st Graders in School District X Should: A. Be able to add any two positive numbers
whose sum is 20 or less. B. Subtract any two numbers (each less than
15) whose difference is a positive number.
Item Pool – Which are Content Valid?
1. 13 + 2 =___ 2. 12 - 5 =____ 3. 10 - 5= ____ 4. 26 - 15 = ____ 5. 13 + 4 – 7 = ____ Sammy has 10 pennies. He lost 2. How many
pennies does Sammy have? A. 2 pennies B. 8 pennies C. 10 pennies D. 12
pennies
Example: Depression (Modified from the DSM – IV)
A complex of symptoms marked by:– Disruptions in appetite and weight – Insomnia or hypersomnia– Loss of interest or pleasure in activities– Loss of energy– Feelings of worthlessness– Feels sad or empty nearly everyday– Frequent death-related thoughts
Item Pool – Which are Content Valid?
I feel blue or sad. I feel nervous when speaking to someone in
authority. I have crying spells. I’m always willing to admit it when I make a
mistake. I felt that everything I did was an effort. I never resent being asked to return a favor. I experience spells of terror or panic.
Contamination & Deficiency
Construct Measure
Relevance or Content Validity
MeasureContamination
MeasureDeficiency
What do we want?
A measure that samples from all important domains or aspects (Low Deficiency)
A measure that does not include anything irrelevant (Low Contamination)
That is, a measure that adequately captures all of the domains of the construct that it is intended to measure. (High Content Validity)
What Else Do We Want: A Measure that Predicts Something It Should!
Criterion-related Evidence for a Measure
What should this test predict? What inferences are we going to use this test to make?
Criterion-related validation is data based. Does the test actually predict behavior that it is
supposed to predict?– Correlate an honesty test with employee theft– Correlate a pencil and paper measure of delinquency with
arrest records– Correlate a measure of study habits with actual grades
Two types of criterion-related validity
Predictive validity – future criteria
Concurrent validity – current criteria
This distinction makes no procedural difference (Both correlations)
Think of a Relevant Criterion
SAT or ACT Scores A Measure of Conscientiousness A Measure of Political Liberalism A Measure of Relationship Satisfaction
Criterion-related validity: Concurrent validity
Students who have been admitted to MSU take the SAT. Their GPA is recorded at the same time.
The correlation between the test scores and performance is computed. This correlation is sometimes called a validity coefficient.
Criterion-related validity: Predictive validity
Students take the SAT (or ACT) during High School and then some are selected into MSU. Later, their SAT scores are correlated with their college GPA.
This correlation is also sometimes called a validity coefficient.
If SAT scores and college GPA are correlated, then the SAT has some degree of predictive validity for predicting college GPA.
In both cases the degree of criterion-related validity is inferred from the
size of the correlation….
Issues
What is our Criterion? How do we measure it?– Reliability of Predictor and Criterion– Recall: What does measurement error do?
What sample will we use?– Small Samples – More Imprecision– Issues of Generalization
Restriction of Range– Want Variability on both Predictor and Criterion variables
Predictor-Criterion Overlap– Same “items” on both measures … bad!
Measurement Error
Reliability – Index of the presence of measurement error (1.0 reliability = No error)
Unreliability in the predictor and criterion increases the error variance and therefore serves to reduce (attenuate) the observed correlation between them
When/where might we find unreliability? … Everywhere!
Tests used as predictors (e.g., measures of depression)
Criterion measures (e.g., ratings of client well-being)
Unreliability is a concern for both predictors and criteria – unreliability in both can reduce correlations
Correcting Correlations for Attenuation
rxy = observed correlation between x and yrxx and rxx = reliability coefficients of x and y
Construct Validity – How Well Does a Measure Actually Assess the Underlying
Conceptual Variable?
• Often the focus is on the Construct (i.e., the idea) and NOT just the properties of a single measure.
• How does this construct fit into a nomological network (a lawful network of expected relations)?
• Can we get convergence across different measures of the SAME construct?
• Can we get divergence? Are measures of different constructs unrelated?
Key Terms (Campbell & Fiske, 1959)
Convergent Validity: Associations Between Different Methods of Assessing the Same Construct. Confirmation of the Measurement of the Construct using Multiple Methods.
Discriminate Validity: Distinctiveness of Constructs. This is indicated by a lack of association between measures of different constructs.
Jingle Fallacy (Kelley, 1927)
Jingle fallacy: Belief that because the same name is applied to measures of different constructs, these measures are really assessing the same thing.– Smith’s Measure of Extraversion and Robert’s
Measure of Extraversion might not actually measure the same thing.
Jangle fallacy (Kelley, 1927)
Jangle fallacy: Belief that because measures are called by different names they are measuring different constructs.– Smith’s Measure of Sociability and Robert’s
Measure of Surgency might both actually measure Extraversion.
Q: How do examine all of these ideas?
A: Use Correlation Matrices!
Multitrait-Multimethod Matrices (MTMM)
Suppose we measure three different personality traits– Extraversion– Conscientiousness– Neuroticism
Suppose we measure each of these traits in three different ways
– Self-report– Informant Report– Behavior test (Won’t Show this on Charts)
(.84).11-.11.39.19-.16N
(.94).22.45.32CSelf
(.93).57E
(.76)-.25-.28N
(.89).30CInt
(.89)E
NCENCE
Self-ReportInformant
Where is the convergent validity?Where is the discriminant validity?
Note.: E=Extraversion, C=Conscientiousness, N=Neuroticism
Convergent Evidence
Same construct assessed using different methods (self versus informant)
Convergent Validity diagonal (blue font)– E: .57– C: .45– N: .39
Technical Label: Monotrait-Heteromethod correlation (“trait correlation”)
– Same Trait – Different Method
(.84).11-.11.39.19-.16N
(.94).22.45.32CSelf
(.93).57E
(.76)-.25-.28N
(.89).30CInt
(.89)E
NCENCE
Self-ReportInformant
Note.: E=Extraversion, C=Conscientiousness, N=Neuroticism
Divergent evidence
Different traits assessed using the same method – (Want Low Correlations)
– Technical: Heterotrait-Monomethod (Method Correlation)– Glop or Method Variance
Different traits assessed using different methods - (Want Low Correlations)
– Technical: Heterotrait-Hetromethod (“Neither”)– Should be the lowest correlations in the MTMM Matrix
(.84).11-.11.39.19-.16N
(.94).22.45.32CSelf
(.93).57E
(.76)-.25-.28N
(.89).30CInt
(.89)E
NCENCE
Self-ReportInformant
Note.: E=Extraversion, C=Conscientiousness, N=Neuroticism
Differentiation Between Groups
Examine the difference in test scores arising from groups known to differ on the construct– Kids with ADHD versus Kids Without ADHD– Depressed versus Non-Depressed People– Criminals versus Non-Criminals– Masculinity versus Femininity
– Discriminant group Validity
Factor Analysis
Basic Ideas
Figure out what is related and what is not? A construct-validity question … (Talking about convergence and divergence)
We do factor analysis in our heads all the time in real life!
Statistical Procedure to reduce a large number of intercorrelations to a smaller number of factors that summarize the pattern of observed correlations between variables
What is a Factor?
Variables that give rise to correlations between items on a questionnaire.
The existence of factors is inferred from patterns of association between observed variables.
Factors are sometimes called Source Traits or Latent Traits.
Goal of Factor Analysis is to identify these latent (unobserved) variables.