COMPARISON of WITH-REPLACEMENT and WITHOUT-
REPLACEMENT VARIANCE ESTIMATES for a COMPLEX SURVEY
Frank J. Potter (MPR)Frank J. Potter (MPR)Stephen Williams (MPR)Stephen Williams (MPR)Nuria DiazNuria Diaz--Tena (MPR)Tena (MPR)
James Reschovsky (HSC) James Reschovsky (HSC) Elizabeth Schaefer (HSC)Elizabeth Schaefer (HSC)
APHA APHA November 2003November 2003
Overview
IntroductionIntroduction
Study Objectives and MethodsStudy Objectives and Methods
VarianceVariance estimation considerationsestimation considerations
Comparisons for different assumptionsComparisons for different assumptions
SummarySummary
Community Tracking Study (CTS)
Data on changes in healthcare systemData on changes in healthcare system
–– Primary focus on communityPrimary focus on community
SiteSite--level analysislevel analysis
–– National estimates as byproductNational estimates as byproduct
Data made available to researchersData made available to researchers
CTS Sample Structure Multi-stage Multi-sample Design
Two independent samplesTwo independent samples
MultiMulti--stage designstage design
–– 60 PSUs (called sites)60 PSUs (called sites)
–– 9 Certainty PSUs9 Certainty PSUs
Supplemental sampleSupplemental sample
–– Stratified random national sampleStratified random national sample
Multi-stage Sample Design 60 PSUs / Sites60 PSUs / Sites
–– 12 for intensity study12 for intensity study
–– 48 other sites 48 other sites
improve national coverage and precisionimprove national coverage and precision
Probability proportional to sizeProbability proportional to size
Stratified by MSA size and region Stratified by MSA size and region
WithoutWithout--replacement selectionreplacement selection
Survey Data Variance Estimation
Two general approachesTwo general approaches–– Taylor series linearizationTaylor series linearization–– Replication methodsReplication methods
Software availableSoftware available–– SUDAAN (version 8)SUDAAN (version 8)–– Stata (version 8)Stata (version 8)–– SAS (version 8) SAS (version 8) SurveyregsSurveyregs//SurveymeansSurveymeans–– WesVar (version 4)WesVar (version 4)
Recommend SUDAAN for CTSRecommend SUDAAN for CTSWWW.WWW.FASFAS.HARVARD.EDU/~STATS/SURVEY.HARVARD.EDU/~STATS/SURVEY--SOFTSOFT
Why Without Replacement?
WithoutWithout--replacement selection of PSUs (sites)replacement selection of PSUs (sites)
Probability proportion to sizeProbability proportion to size
–– Certainty PSUsCertainty PSUs
Small PSU frame Small PSU frame
–– Sizeable finite population correction factor (Sizeable finite population correction factor (FPCFPC))
FPCFPC →→ Joint inclusion probabilitiesJoint inclusion probabilities
Only SUDAAN has capabilityOnly SUDAAN has capability
COMPARISON of ALTERNATIVES
Study Measure: Study Measure: ReldiffReldiff ((%)%): :
ReldiffReldiff = 100*(= 100*(SEwrSEwr –– SEworSEwor) / ) / SEworSEwor
SUDAAN used for analysisSUDAAN used for analysis
–– SEworSEwor using DESIGN =using DESIGN = UNEQWORUNEQWOR
–– SEwr SEwr using DESIGN = WRusing DESIGN = WR
Comparison of Variances Using WOR and WR Assumption
Methods comparedMethods compared––SUDAAN, Stata, and SAS withSUDAAN, Stata, and SAS with--replacementreplacement––SUDAAN withoutSUDAAN without--replacementreplacement
Household surveyHousehold survey––126 Estimates (samples of 6000126 Estimates (samples of 6000--60,000)60,000)––Domains: All, Hispanic, low income uninsuredDomains: All, Hispanic, low income uninsured
Physician surveyPhysician survey––35 Estimates (samples of 4,00035 Estimates (samples of 4,000--12,000)12,000)––Domains: All, high MC revenue, solo, groupDomains: All, high MC revenue, solo, group
Ref Difference of Standard ErrorsALL HOUSEHOLDS
5128
166
42
315
-15
-5
5
15
25
35
45
>50
Rel
Diff
frequency
ALL vs LOW-INCOME HHLOW INCOME HOUSEHOLDS
5134
733
32
-15
-5
5
15
25
35
45
>50
Rel
Diff
frequency
ALL HOUSEHOLDS
5128
166
42
315
-15
-5
5
15
25
35
45
>50
Rel
Diff
frequency
ALL HH vs HISPANIC HHHISPANIC HOUSEHOLDS
1912
622
29
1428
-25
-15
-55
15
2535
45
>50
Rel
Diff
frequency
ALL HOUSEHOLDS
51
28
16
6
4
2
15
3
-25
-15
-5
5
15
25
35
45
>50
Rel
Diff
frequency
HOUSEHOLD SURVEY SUMMARY
3030202063631414PERCENT WITHPERCENT WITHRELDIFFRELDIFF < 0< 0
6688--331212AVERAGEAVERAGERELDIFFRELDIFF(%)(%)
NOT NOT INSUREDINSURED
LOWLOWINCOMEINCOMEHISPHISPALL ALL HHHH
PHYSICIANS: ALLALL PHYSICIANS
388
72
12
-15
-5
5
15
25
35
45
>50
Rel
Diff
frequency
HOUSEHOLDS vs. PHYSICIANSALL PHYSICIANS
388
72
12
-15
-5
5
15
25
35
45
>50
Rel
Diff
frequency
ALL HOUSEHOLDS
5128
166
42
315
-15
-5
5
15
25
35
45
>50
Rel
Diff
frequency
ALL PHYS. vs SOLO PRACTICE
ALL PHYSICIANS
23
88
72
1-25
-15
-5
5
15
25
35
45
Rel
Diff
frequency
SOLO and TWO-PHYSICIAN PRACTICES
75
11
3
17
-25
-15
-5
5
15
25
35
45
Rel
Diff
frequency
PHYSICIAN SURVEY SUMMARY
33
1818H
IGH
M.C
. H
IGH
M.C
. R
EVENU
ER
EVENU
E1010
2222
ALL PH
YSA
LL PHYS
441515444444PERCENT WITH PERCENT WITH RelRel Diff Diff < 0< 0
21211212282822AVERAGEAVERAGERelRel Diff (%)Diff (%)
SPECIA
LISTSPEC
IALIST
PCP
PCP
GR
OU
P G
RO
UP
PRA
CTIC
EPR
AC
TICE
SMA
LL SM
ALL
PRA
CTIC
EPR
AC
TICE
Comparison for Descriptive StatisticsRelative Differences in RSEs
334444441010%<0%<0
17.617.627.627.61.71.721.521.5MeanMean
High M.C. High M.C. RevenueRevenueGroupGroupSoloSoloAllAllPhysicianPhysician
3030202063631414%<0%<06.26.28.48.4--2.92.911.811.8MeanMean
UninsuredUninsuredLow Low
IncomeIncomeHispanicHispanicAllAllHouseholdHousehold
Comparison for Multivariate Statistics~RelDiff for Coefficient RSEs~
CTS Household ModelsCTS Household Models
Logit (23)Logit (23)Linear Linear (7)(7)
Linear Linear (24)(24)Linear (12)Linear (12)Model Model
((varsvars))
990016162525% % < 0< 0
9.69.611.011.020.320.3**5.05.0MeanMean
Health Health plan ratingplan rating
Health Health statusstatus
Cost Cost concernsconcerns
Ambulatory Ambulatory visitsvisits
Comparison for Multivariate Statistics~RelDiff for Coefficient RSEs~
CTS Physician ModelsCTS Physician Models
Logit (21)Logit (21)Logit (32)Logit (32)Linear Linear (13)(13)Linear (20)Linear (20)Model Model
((varsvars))
5516168855% % < 0< 0
14.814.89.39.318.718.79.39.3MeanMean
Charity Charity carecare
Career Career satisfactionsatisfactionIncomeIncome
Hours of Hours of charitycharity
Summary of Findings
Minor SE differences for household survey, Minor SE differences for household survey, major differences for physician surveymajor differences for physician surveySmall domains => Unstable variancesSmall domains => Unstable variancesHispanic domain clustered: 40% in 3 sitesHispanic domain clustered: 40% in 3 sitesWOR incorporates more of the CTS sample WOR incorporates more of the CTS sample designdesign
CONCLUSIONSCTS has complex sample design CTS has complex sample design –– requires weights requires weights –– specialized variance estimation softwarespecialized variance estimation software
WithoutWithout--replacement assumption (SUDAAN) replacement assumption (SUDAAN) more fully accounts for sample designmore fully accounts for sample design
WR assumption generally conservativeWR assumption generally conservative–– Some unpredictable resultsSome unpredictable results–– smallsmall variance estimates for some subgroupsvariance estimates for some subgroups
Accepting conservative WR SEs has costs in Accepting conservative WR SEs has costs in statistical powerstatistical power
CTS Publications
Center for Studying Health System ChangeCenter for Studying Health System Change
– WWW. HSCHANGE.ORG
–– Links to Links to ICPSR ICPSR for datafor data
–– CTSonline: an interactive system CTSonline: an interactive system
Information availableInformation available–– Data BulletinsData Bulletins–– Issue BriefsIssue Briefs–– Community Tracking ReportsCommunity Tracking Reports
Public and Restricted Use Files
Public Use FilesPublic Use Files–– Available to all researchers via ICPSRAvailable to all researchers via ICPSR–– Some limitationsSome limitations
Some variables deleted or modifiedSome variables deleted or modifiedOther limitationsOther limitations
Restricted Use FilesRestricted Use Files–– Must sign dataMust sign data--use agreementuse agreement–– Variance estimation parametersVariance estimation parameters