Quick and Painless Introduction to Survey Methodology
R. Michael Alvarez
PS 120
Testing Theories or Models
• Experimental data: expensive, and has validity problems
• Quasi-experimental data: aggregate election statistics, other data. Suffers from various problems.
• Survey data: data about individual voters
Fundamentals of Surveying
• Population: all elements of interest, usually in a geographic area
• Sample: subset of population• Sample frame: list of sample (addresses,
phone numbers, email addresses, etc)
Basic Typology of Surveys
• Probability designs: population elements have a known (at least in theory) probability of selection into the sample.
• Nonprobability designs: population elements have an unknown probability of selection into the sample All of the statistical tools we use to study
survey data are based on probability designs!
Literary Digest 1936: What Went Wrong?
0
10
20
30
40
50
60
70
Roosevelt Vote
Election
Literary DigestGallup Poll
Crossley PollFortune
Literary Digest Methodology
• Sent out 10 million straw ballots, using a list drawn from auto registration lists and telephone books.
• 2.3 million were returned, about 25% response. Flawed sample (overrepresented rich and
Republicans) Low response rate
Literary Digest Fiasco Reforms Polling
• Underlying flaws of Literary Digest straw polls revealed --- not using a scientific sampling procedure
• Others, especially Gallup, Roper and Crossley began to work to find better ways of generating samples …
• The Literary Digest soon folded!
Problems Continue, 1948
0
10
20
30
40
50
60
Truman Dewey Thurmond Wallace
Crossley
GallupRoper
Election
New Sampling Techniques Were Flawed!
• Before 1948, they used “quota sampling”• Each interviewer is assigned a fixed quota
of subjects to interview from certain demographic categories … gender, age, education, residential location.
• Once they met their quota, the interviewer could select anyone they desired until they conducted all their required interviews
Quota Sampling
• It’s not necessarily a stupid idea, as long as the underlying data (Census data?) used to construct the parameters of the sample are okay.
• But, what can happen is that interviewers end up working to talk with people who are easy to contact. In 1948 that tended to be people in nice neighborhoods, with fixed addresses and phones (ie, Republicans).
Random Sampling
• In the 1950’s, most scientific surveys shifted to the use of random sampling
• For example, Gallup in 1956 moves to the use of random selection methods and seems to generate more accurate presidential election forecasts thereafter
Gallup’s Track Record
-8
-6
-4
-2
0
2
4
6
19361940 194419481952 19561960 196419681972 19761980 198419881992
Difference
Basic Introduction to Sampling
• Concept: The population (or universe or target population).
• The population is the entire set of units to which a survey will be applied. Individual members of the population are called units or elements.
More on sampling ...
• Next, we need a list of population units from which we can draw a sample.
• This list is called the SAMPLE FRAME• The basic property of a sample frame is that
every unit in the population has some known chance of being selected into the sample by whatever method is used to select units
Then ...
• Probability sample: units are selected using a method that insures that each unit has a known, nonzero probability of being included.
• Nonprobability sample: units are selected and inclusion probabilities are unknown (quota sampling …)
Simple Random Sampling
• All elements of population have equal probability of being sampled Cluster sampling: population is divided into
clusters or groups, and clusters are sampled. Why? Cost and simplicity.
Stratified sampling: population is divided into subpopulations, or strata, and sampling occurs within strata. Why? Strata might be of interest or require different methods of analysis.
Sampling Error
• Best way to think of survey error is in the context of proportions (percent saying “yes” or “no”).
• Standard error of a proportion in SRS: se(p) = sqrt[ ( p(1-p) )/( n - 1 )]
An Example of Survey Error
+- 2 Standard Deviation Example
0
2
4
6
8
10
12
Sample size
Standard deviation of P
P=.5
P=.7
P=.9
P=.5 10.05 7.089 5.783 5.006 4.477 4.086 3.782 3.538 3.335 3.164 3.016 2.888 2.775 2.674 2.583 2.501 2.426 2.358 2.295 2.237
P=.7 9.211 6.497 5.3 4.588 4.103 3.745 3.467 3.242 3.057 2.9 2.765 2.647 2.543 2.45 2.367 2.292 2.224 2.161 2.103 2.05
P=.9 6.03 4.253 3.47 3.004 2.686 2.452 2.269 2.123 2.001 1.898 1.81 1.733 1.665 1.604 1.55 1.5 1.456 1.415 1.377 1.342
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
An Example of Nonresponse Error? March 2001 CSLP RDD
NES Response and Refusal Rates
0
10
20
30
40
50
60
70
80
90
195219601964196819721976198019841988199219962000
ResponseRefusal
Response rate: interviews net of refusals and respondentswho cannot provide an interview (e.g., language, etc)
Misreporting: Voting in Recent Federal Elections
0102030405060708090
1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004
Official Turnout CPS TurnoutNES Turnout McDonald-Popkin
Note: Percentage of voting age population
Item Nonresponse
• Don’t know is necessary in any survey, so that people can tell you if they don’t have an opinion
• Due to uncertainty, vague questions, or respondent unwillingness to answer some questions
Should Gov’t Provide More Services?
050
100150200250300350400450500
Fewer More
No Opinion
Number
1996 NES
Certainty of Responses?
05
10
15
2025
30
3540
45
50
1 2 3 4 5 6 7
Not
PrettyCertain
Senator Position on Abortion Scale, Alvarez and Franklin 1993
Question Wording and Order?
• Would you say that traffic contributes more or less to air pollution than industry? (45% traffic primary contributor to 32% industry).
• Would you say that industry contributes more or less to air pollution than traffic? (57% industry primary contributor to 24% traffic) Wanke et al. 1995
Types of Surveys
• Self-administered questionnaires (mail, web) cheap but:
low response rates uncertainty about who completes questionnaire
Types of Surveys
• Telephone: RDD/CATI quick, random? Uncertainty about respondent, difficult to ask
complex questions, must be short
Types of Surveys
• Face-to-face (on doorstep, exit polls) highly accurate, high response rates very expensive to implement interviewer biases are problematic
Internet surveying --- the future?
• Cheap to implement• Quick in the field, quick with analysis• Can implement complex designs, for
example, use multimedia
Basic types of Internet surveys
• Probability designs• Nonprobability designs• Mixtures of probability and nonprobability
Probability-based Internet surveys
• Intercept-based surveys of visitors to particular web sites
• known email lists (students, etc)
Nonprobability Internet surveys
• Entertainment surveys• Self-selected surveys• Volunteer survey panels
Surveys are not perfect!
• Sampling error (difference between sample and pop.)
• Coverage error (deviation between sample and frame)
• Systematic sampling error; error in frame
• Nonresponse (unit) bias
• Nonresponse (item) bias
• Question wording or ordering effects
• Interviewer error; coding mistakes
How do I evaluate survey results?
• Sample size• Sampling
methodology (probability or non-probability)
• Estimated sampling error
• Survey response rate
• Questionnaire design and question wording
• Item response rates• Intuition: do the
results make sense?
Caltech’s National
Public Relations Initiatives
March 11, 2003
Brief recap of survey methodology
• Survey conducted by ICR• Wednesday, February 12-Sunday February
15• Omnibus survey• N=1010• Tabulation presents weighted results,
weighted to map to American adult population
Questions
1 Considering what you might have seen or heard about the California Institute of Technology, also known as Caltech, in Pasadena, California, which of the following best describes your opinion of Caltech’s reputation. Would you say Caltech’s reputation is excellent, good, fair, or poor?
Questions
2 How did you hear about Caltech? (not asked to those unable to answer 1).
3 What do you think Caltech is best known for (not asked to those unable to answer 1).
Questions
4 Now, as I read each of the following topics, please tell me, generally speaking, whether or not you are interested in the topic: voting, the brain, climate changes, astronomy, earthquakes, nano-technology, detecting gravity waves
Questions
5 And considering those topics in which you said you had an interest, how do you usually get news and information about these topics? (asked to only those who were interested in at least one topic)
National Awareness of Caltech
0
10
20
30
40
50
60
Level of Awareness of Caltech Among General Public
AwareUnaware
0
10
20
30
40
50
60
70
80
High Income College West 55-64
AwareUnaware
Caltech Awareness Successes
Caltech Awareness Challenges
0
10
20
30
40
50
60
70
North East > 64
AwareUnaware
Caltech’s Reputation
0102030405060708090
Caltech's Reputation as Judged by ThoseAware of the Institute
Excellent/Good
Fair/Poor
Media Relations Focus
• National and northeast TV - visit them, pitch them, invite them to campus.
• Households with children• Senior-oriented media
Evaluate the Caltech Awareness Survey
• Technical evaluation• Substantive evaluation• Policy evaluation