Measuring the Correlates of Intent to Participate and Participation in the Census and Trends
in These Correlates:
Comparisons of RDD Telephone and Non-probability Sample Internet Survey Data
Josh PasekStanford & University of Michigan
Jon A. KrosnickStanford & U.S. Census Bureau
Monitoring Threats toCensus Compliance
U.S. Census Bureau commissioned surveys in late 2009 and early 2010 to determine factors that might cause people to complete or not
complete the form
To Assess This . . .
• Examine what proportion of individuals hold particular beliefs about the Census
• Test whether those beliefs relate to intent to complete or completion of Census form
• Regularly examine prevalence of beliefs to assess emergent threats to Census completion
Two Synchronous Data Streams
• 2 data streams collected for Census Bureau
• RDD telephone (Gallup) and non-probability Internet (E-Rewards)
• 13 simultaneous weeks of data collection with > 900 interviews per data stream per week
• Very similar measures
Two Synchronous Data Streams• Intent to complete Census (before and after forms mailed)
• Reported completion of Census (after forms mailed)
• Census will help/hurt respondent
• Locate illegal immigrants
• Trust confidentiality
• Time to fill out
• Importance of counting everyone
• Respondent's participation does not matter
Today's Question
Would These Two Data Streams Lead To The Same
Conclusions?
Three Comparisons
• Proportions
•
• Relations between variables
•
• Trends over time
•
Three Comparisons
• Proportions
• Who holds particular beliefs about Census?
• Relations between variables
•
• Trends over time
•
Three Comparisons
• Proportions
• Who holds particular beliefs about Census?
• Relations between variables
• Which beliefs relate to expected and actual completion?
• Trends over time
•
Three Comparisons
• Proportions
• Who holds particular beliefs about Census?
• Relations between variables
• Which beliefs relate to expected and actual completion?
• Trends over time
• Do the surveys indicate that purported predictors changed similarly over time?
Weighting
• Unweighted
• Weights provided by both houses
• Weighted identically using anesrake (Pasek, 2010)
Weighting
• Unweighted
• Weights provided by both houses
• Weighted identically using anesrake (Pasek, 2010)
Both with and without matching on dates
Three Comparisons
• Proportions
• Relations between variables
• Trends over time
Three Comparisons
• Proportions
• Relations between variables
• Trends over time
Proportions
• Absolute difference between modal categories of non-demographic variables
• | proportion in RDD - proportion in Internet |
• Bootstrap to test significance
• Assessed within each week
ProportionsDifferences Between Data Streams
Difference in Percentage Points
Freq
uenc
y of
Diff
eren
ces
Difference in Percentage Points
Freq
uenc
y of
Diff
eren
ces
ProportionsDifferences Between Data Streams
Week 10Census can help you
RDD: 43.4%Internet: 44.2%Difference: 0.8%
Difference in Percentage Points
Freq
uenc
y of
Diff
eren
ces
ProportionsDifferences Between Data Streams
Week 3Rs participation does not matter - agree
RDD: 54.4%Internet: 40.5%Difference: 13.9%
Difference in Percentage Points
Freq
uenc
y of
Diff
eren
ces
ProportionsDifferences Between Data Streams
Week 12Important to CountEveryone - Agree
RDD: 65.7%Internet: 32.2%Difference: 33.5%
ProportionsDifferences Between Data Streams
Difference in Percentage Points
Freq
uenc
y of
Diff
eren
ces
Diff. Data
> 5 -- 80%
> 10 -- 55%
> 15 -- 40%
> 20 -- 26%
> 25 -- 16%
Three Comparisons
• Proportions
• Relations between variables
• Trends over time
Relations Between Variables
• Regressions predicting relevant outcomes
• Intent to complete Census
• Reported completion of Census
• Correlations between pairs of variables
Predicting Census Completion
Variable RDD Internet Difference
Numbers shown are from a logistic regression with all measures * p<.05 | ** p<.01 | *** p<.001
Predicting Census Completion
Variable RDD Internet Difference
Married .31* .58** .27
Same story, same magnitude
Numbers shown are from a logistic regression with all predictors simultaneously * p<.05 | ** p<.01 | *** p<.001
Variable RDD Internet Difference
Married .31* .58** .27
Age 25-34 -1.51*** -.68* .83*
Numbers shown are from a logistic regression with all predictors simultaneously * p<.05 | ** p<.01 | *** p<.001
Predicting Census Completion
Same story, different magnitude
Variable RDD Internet Difference
Married .31* .58** .27
Age 25-34 -1.51*** -.68* .83*
Don't have time to fill out - Disagree .88** .68 .19
Predicting Census Completion
Different story, same magnitude
Numbers shown are from a logistic regression with all predictors simultaneously * p<.05 | ** p<.01 | *** p<.001
Variable RDD Internet Difference
Married .31* .58** .27
Age 25-34 -1.51*** -.68* .83*
Don't have time to fill out - Disagree .88** .68 .19
Importance of counting everyone - Agree .98* -.37 1.27*
Numbers shown are from a logistic regression with all measures * p<.05 | ** p<.01 | *** p<.001
Predicting Census Completion
Different story, different magnitude
Predicting Census Completion
TypeOne
Predictor at a Time
AllPredictors
Same storySame magnitude 57% 31%
Same storyDifferent magnitude 13 6
Different storySame magnitude 30 56
Different storyDifferent magnitude 0 6
Total(Number of Variables)
100%(23)
100%(16)
Numbers shown are for variables that were significant in at least one data stream
Predicting Census Completion
TypeOne
Predictor at a Time
AllPredictors
Same storySame magnitude 57% 31%
Same storyDifferent magnitude 13 6
Different storySame magnitude 30 56
Different storyDifferent magnitude 0 6
Total(Number of Variables)
100%(23)
100%(16)
Numbers shown are for variables that were significant in at least one data stream
Predicting Intent to Complete FormType
OnePredictor at a
Time
All Predictors
Same storySame magnitude 41% 48%
Same storyDifferent magnitude 29 20
Different storySame magnitude 15 20
Different storyDifferent magnitude 15 12
Total(Number of Variables)
100%(41)
100%(25)
Numbers shown are for variables that were significant in at least one data stream
Predicting Intent to Complete FormType
OnePredictor at a
Time
All Predictors
Same storySame magnitude 41% 48%
Same storyDifferent magnitude 29 20
Different storySame magnitude 15 20
Different storyDifferent magnitude 15 12
Total(Number of Variables)
100%(41)
100%(25)
Numbers shown are for variables that were significant in at least one data stream
Three Comparisons
• Proportions
• Relations between variables
• Trends over time
Trends Over Time
• Correlations between variable categories over weeks
• Chi-squared tests comparing differences between data streams across weeks
Among variables with significant variations over time in at least one data stream
Sometimes the trends match
Trends Over Time
Date
Perc
ent
RDDInternet
Trends Over Time
Date
Perc
ent RDD
Internet
Sometimes the trends don't match
Trends Over Time
Date
Perc
ent
RDD
Internet
Trends Over Time
Date
Perc
ent RDD
Internet
And sometimes, the trends are opposites
Trends Over Time
Date
Perc
ent
RDDInternet
Trends Over Time
Date
Perc
ent RDD
Internet
Trends Over Time
Among variables with significant variations over time in at least one data stream
Correlations Between Data Streams
Freq
uenc
y
Differences Between Streams vs. "Sampling Error"
0
20
40
60
80
100
Proportions Relations Trends
Chance
73%significantly
different
30%significantly
different
76%significantly
different
Differences Between Streams vs. "Sampling Error"
0
20
40
60
80
100
Proportions Relations Trends
Chance
73%significantly
different
30%significantly
different
76%significantly
differentNone of the weighting strategies
changed these basic results
Data from one of the top RDD firms and one of the most visible
Internet firms produced very different results
Researchers need to choose, and that choice should depend
on the validity of the results
But Which is More Valid?
• More accurate self-reports using Internet mode (Chang and Krosnick, 2009)
• Some theoretical reasons to prefer probability sampling (even with contemporary response rates)
We cannot know based on these data alone
Measuring the Correlates of Intent to Participate in the Census and Trends
in These Correlates:
Comparisons of RDD Telephone and Non-probability Sample Internet Survey Data
Josh PasekStanford & University of Michigan
Jon A. KrosnickStanford & U.S. Census Bureau
Not really the point . . .
If the methods reach different conclusions, they can't both be correct
What Can We Make Of This?
• Mode effects
• Slight differences in question wording
• Neither survey is consistently within sampling error of accuracy benchmarks (though RDD is a little closer)
The results are not equivalent!
Benchmark Comparison
• Comparison of modal categories
• Variables not used in weighting or quotas• Primary household language (English)• Own or rent home (Own)• Children in household (Yes)
• Absolute difference between sample mean and benchmark (weighted)
Benchmark ComparisonTelephone Internet
Freq
uenc
y
Percentage Point Difference Percentage Point Difference
11.4 Avg. Error 13.8
18.6 Max Error 21.1
Benchmark ComparisonTelephone Internet
Freq
uenc
y
Percentage Point Difference Percentage Point Difference
11.4 Avg. Error 13.8
18.6 Max Error 21.1p<.001 difference
Benchmark Conclusions
• Neither survey is consistently within sampling error
• RDD sample is somewhat more accurate than Internet sample