Practical Considerations in Design and
Analysis of Dual-Frame Telephone Surveys: a
Simulation Perspective
Bo Lu
College of Public Health, The Ohio State University
Timothy R Sahr, Daniel Weston II*
Ohio Colleges of Medicine Government Resource Center
Ronaldo Iachan, Matthew Denker
ICF
Thomas P. Duffy
RTI International
“Advancing Knowledge. Improving Life”
Outline
1.Background
2.Design and Estimation in Dual Frame Survey
3.Simulation Studies
4.OFHS 2010
5.Summary
“Advancing Knowledge. Improving Life”
The authors express no conflict of interest for the research
and findings relating to this presentation.
Disclosure
“Advancing Knowledge. Improving Life”
Background
Dual Frame Survey
• Sampling Frame: The list of sampling units.
i.e. In a telephone survey, the sampling frame might be a
list of all residential telephone numbers in a certain area.
• Dual Frame Survey: A survey with two sampling frames,
A and B, together covering the population of interest.
“Advancing Knowledge. Improving Life”
Background
Dual Frame Survey
• Independent probability samples are taken from two
overlapping frames and information from the two
samples is combined to estimate quantities of interest.
• In telephone surveys, usually, we combine a
conventional RDD landline sample with a cell phone
sample to maximize the sample coverage
“Advancing Knowledge. Improving Life”
Background
Why dual frame sampling?
• The use of cell phones has gained tremendous popularity
over the last decade
• Recent NHIS national cell phone estimates indicate that
55.0% of households have one or more cell phones and
about 31.6% of households have only cell phones (no
landlines) (Blumberg & Luke, 2011)
• The NHIS estimates 11.2% of household only have a
landline phone
• In the first quarter of 2009, 3.9 million Iphones sold
accounting for 10.8% of the smartphone market
“Advancing Knowledge. Improving Life”
Background
Example: OFHS 2010
- The 2010 Ohio Family Health Survey (OFHS) is the most
current survey on health coverage and other health-related
characteristics in Ohio.
- It contains nearly 8,276 adult responses and proxy
responses for 2,002 children.
- In adult sample, it consists of a geographically stratified
RDD sample (main sample) and a supplementary simple
random sample (statewide) of cell phone users (1,587)
targeted to reach younger and male populations.
“Advancing Knowledge. Improving Life”
Outline
1.Background
2.Design and Estimation in Dual Frame Survey
3.Simulation Studies
4.OFHS 2010
5.Summary
“Advancing Knowledge. Improving Life”
Design
• Diagram of Dual Frame Sampling
S5
S3 S4
S1 S2
PA PAB PB
“Advancing Knowledge. Improving Life”
Design
Two designs to consider:
- Cell phone-any
Sample all cell phone users
Implementation-wise, it is simple and cheap
Analytically, it is hard to calculate the accurate weights
- Cell phone-only
Sample cell phone-only users
Analytically, it is easy to compute the sampling weights
Implementation-wise, it is time-consuming and expensive
“Advancing Knowledge. Improving Life”
Design
Biases in Landline and Cell Phone Survey:
- Population characteristics vary by phone use pattern
- Cell phone-only design is associated with non-coverage
bias for primary cell phone users
- Both designs are associated with mode salience bias
In landline sample, the response probability can be low
when trying to reach primary cell phone users
In cell phone samples, the response probability can be
low when try to reach primary landline users
“Advancing Knowledge. Improving Life”
Design
Cost (per completed interview):
2010 OFHS BRFSS 2007 (Link et al.)
Landline $62.80 $64.25
Cell-any $107 $74.18
Cell-only $184 $195.78
“Advancing Knowledge. Improving Life”
Estimation
Commonly used estimation strategies:
1.Calculate sampling weights
• Simple Composite Estimation (Hartley, 1974)
Combine the information from both frames to estimate in
the overlapping population as weighted averages
w
where
Usually, λ=0.5 or is chosen to minimize the variance of the
estimator.
“Advancing Knowledge. Improving Life”
Estimation
• Single Frame Estimation
All observations are treated as if they had been sampled
from a single frame and the sampling weights of
observations in the intersection are modified to reflect the
fact that they have two chances of being selected
- It is simple
- It is unbiased
- It might be not as efficient as PML estimation since it
doesn’t use all relevant information
“Advancing Knowledge. Improving Life”
Estimation
• PML Estimation (Skinner and Rao, 1996)
Pseudo maximum-likelihood method adopts maximum
likelihood principles and applies to complex survey design
- It is a dual frame method
- It is unbiased
- It uses the same set of weights for all response variables
- It is more complex to implement than other methods
“Advancing Knowledge. Improving Life”
Outline
1.Background
2.Design and Estimation in Dual Frame Survey
3.Simulation Studies
4.OFHS 2010
5.Summary
“Advancing Knowledge. Improving Life”
Simulation
Simulation Setup
1. Population Generation
A hypothetical population with N=200,000 is randomly generated,
consisting of four subpopulations (LO, LP, CP, CO) and it is fixed
for subsequent simulations
2. Sampling
Two sampling designs are considered: cell-any and cell-only. For
each design, a random sample of 2000 is taken and repeated for
1000 times.
3. Estimation
Three commonly used estimation approaches (SCE, SFE, PML),
are compared based on the samples taken in step 2.
“Advancing Knowledge. Improving Life”
Simulation
Simulation Results
- Quantity of interest
Mean of a continuous outcome variable (FPL) in the entire
population; Truth=363.55
- Diagnostic Statistics
Percent Relative Bias (PRB)
100x(Ŷ-363.55)/363.55
Empirical Mean Squared Error (EMSE)
- Sample size allocation for cell phone component
With a fixed total sample size, what proportion should the
cell phone be, in terms of estimation precision?
“Advancing Knowledge. Improving Life”
Simulation
Simulation Results (Cont.) Fixed Sample Size Mean PRB EMSE
Ideal
Scenario
Cell only Stratified 363.84 0.018 3.585
Cell any SCE 363.65 -0.035 7.196
SFE 363.64 -0.036 7.199
PMLE 363.66 -0.033 5.762
Medium
Inaccessibility
Cell only Stratified 350.69 -3.597 172.93
Cell any SCE 343.01 -5.708 434.32
SFE 342.96 -5.723 436.56
PMLE 344.99 -5.163 355.55
Medium
Inaccessibility+
Raking by a
Covariate
Cell only Stratified 351.65 -3.334 148.52
Cell any SCE 343.98 -5.441 394.64
SFE 343.83 -5.483 400.66
PMLE 346.07 -4.867 315.97
Medium
Inaccessibility+
Raking by
Phone
Use Pattern
Cell only Stratified 363.86 0.024 2.203
Cell any SCE 363.75 -0.006 2.840
SFE 363.75 -0.007 2.833
PMLE 363.75 -0.006 2.836
“Advancing Knowledge. Improving Life”
Simulation
“Advancing Knowledge. Improving Life”
Outline
1.Background
2.Design and Estimation in Dual Frame Survey
3.Simulation Studies
4.OFHS 2010
5.Summary
“Advancing Knowledge. Improving Life”
OFHS 2010
- Cell phone any design was implemented in OFHS 2010
- Simulation results show that raking is crucial in reducing
the bias due to mode salience.
- In OFHS, we used region, age, gender, education, race
and Medicaid status for raking.
- For illustration purpose, we estimate the mean monthly
income and proportion of having insurance with all three
methods discussed earlier.
“Advancing Knowledge. Improving Life”
OFHS 2010
Estimation
Methods
Outcome Mean/Proportion Standard Error
SCE Monthly Income 4917.90 287.07
Insurance 0.8439 0.0087
SFE Monthly Income 4681.29 218.07
Insurance 0.8335 0.0075
PMLE Monthly Income 4722.08 223.90
Insurance 0.8347 0.0060
“Advancing Knowledge. Improving Life”
Outline
1.Background
2.Design and Estimation in Dual Frame Survey
3.Simulation Studies
4.OFHS 2010
5.Summary
“Advancing Knowledge. Improving Life”
Summary
Summary
- Telephone survey with both landline and cellphone component is a
must.
- Cell phone-any design is easy to implement and less expensive; it
suffers from non-response bias more when good raking
information is not available; PMLE involves more sophisticated
adjustment, SFE is relatively easy to implement and yields results
close to PMLE in OFHS analysis.
- Cell phone-only design is hard to implement and more expensive;
the analysis is easy to implement; simulation results show that it
yields the most accurate results when good raking is not possible.
- 25%-30% cell phone sample would be good in practice.
“Advancing Knowledge. Improving Life”
Thank you!